Transcript
Page 1: Gettingstarted Guide

Getting Started with Ataccama DQ Analyzer 7 pg. 1

Getting Started with DQ Analyzer 7

Copyright © 2010 Ataccama Corporation

Page 2: Gettingstarted Guide

Getting Started with Ataccama DQ Analyzer 7 pg. 2

Contents

1. Introduction ................................................................................................................... 3

Other Help Resources .............................................................................................................. 3

2. User Interface Overview ................................................................................................. 3

The Explorer Panel ................................................................................................................... 4

The Editing Area ....................................................................................................................... 4

The Status Panel ...................................................................................................................... 4

Configuration Dialogs .............................................................................................................. 4

3. Creating a Profile ............................................................................................................ 5

The Profiling Step Editor .......................................................................................................... 9 General Category ............................................................................................................................. 10 Input Category ................................................................................................................................. 12

4. Reading a Profile .......................................................................................................... 14

Introduction ........................................................................................................................... 14

Column Analyses .................................................................................................................... 15 Basic Analyses .................................................................................................................................. 15 Interpreting Counts .......................................................................................................................... 16 Frequency ........................................................................................................................................ 17 Domain Analysis ............................................................................................................................... 17 Mask Analysis ................................................................................................................................... 18 Quantiles .......................................................................................................................................... 18 Group Frequency Analysis ................................................................................................................ 19

Inputs and Roll Ups ................................................................................................................ 19

Primary Keys .......................................................................................................................... 19

Foreign Keys ........................................................................................................................... 19

Business Rules ........................................................................................................................ 19

Dependency Analysis ............................................................................................................. 20

5. Working with Plan Files ................................................................................................ 20

The Plan Editor ....................................................................................................................... 21 Building a Plan .................................................................................................................................. 21 Editing Steps ..................................................................................................................................... 22 Comments ........................................................................................................................................ 23

Working with Data Files ......................................................................................................... 24 Viewing Data Files ............................................................................................................................ 24 Using Data Files in a Plan ................................................................................................................. 30

Tips for Using Steps ............................................................................................................... 31 Description of Steps ......................................................................................................................... 31 Using Functions ................................................................................................................................ 32 Using Regular Expressions ............................................................................................................... 32

Debugging and Running a Plan .............................................................................................. 33

6. Connecting to a Database ............................................................................................. 34

Page 3: Gettingstarted Guide

Getting Started with Ataccama DQ Analyzer 7 pg. 3

1. Introduction

Ataccama DQ Analyzer 7 is a comprehensive tool for Data Profiling. This guide is intended to provide an overview of the basic functionality of the product and describe how to perform common functions. Some of the highlights of this guide include:

How to Profile a text file in under 30 seconds (pg. 5)

How to read and interpret Profiling results (pg. 14)

Short descriptions of the included algorithms (Steps) (pg. 31)

How to connect to a database for additional profiling capabilities (pg. 34)

Other Help Resources

Video Tutorials

There are video tutorials available which demonstrate how to perform common tasks in DQ Analyzer. There are links to these tutorials from the Welcome Screen.

Tutorial Files

DQ Analyzer includes sample Profiling projects which contain pre-built, runnable configurations.

Help Files

For help on specific functions or features not covered in this guide or the resources mentioned above, extensive documentation is available in the product Help (available in Help → Help Contents in the toolbar).

2. User Interface Overview

DQ Analyzer is a development tool built on the Eclipse framework, so it is similar in structure and behavior to many Integrated Development Environments (IDEs). The user interface is comprised of four main areas:

1. The explorer panel 2. The main editing area 3. The status panel 4. Configuration dialogs (not shown)

Page 4: Gettingstarted Guide

Getting Started with Ataccama DQ Analyzer 7 pg. 4

The Explorer Panel

The explorer panel is where project files and database connections are shown. Many action shortcuts are available by right-clicking on objects in this panel, such as creating new files or connecting to a database.

The Editing Area

The editing area is where Profiles are shown. It is also where Plan files are created and edited. These will be described in detail in subsequent sections.

The Status Panel

The Properties panel contains a section that displays any problems that DQ Analyzer detects in Plan files. Clicking on a problem in the list will open the component that contains the problem. This area is also used to show the progress of generating a Profile or running a Plan.

Configuration Dialogs

There are various dialogs used in DQ Analyzer to configure the different components, similar to the one shown below. These dialogs are typically invoked by double-clicking Steps or via the context (right-click) menu.

1

2

3

Page 5: Gettingstarted Guide

Getting Started with Ataccama DQ Analyzer 7 pg. 5

3. Creating a Profile

To create a Profile, click on the “Create a Profile” link from the Welcome Screen or select New → Profile from the toolbar or context menu (as shown below).

You will first be given the option to create a Profile from a file or from a database table.

Page 6: Gettingstarted Guide

Getting Started with Ataccama DQ Analyzer 7 pg. 6

In order to profile a database table, you must have a database configured (see Chapter 6 - Connecting to a Database to learn how to do this). In this guide we will demonstrate the process of profiling a text file.

Choose the file you would like to profile and click Next >.

Page 7: Gettingstarted Guide

Getting Started with Ataccama DQ Analyzer 7 pg. 7

Note: You can skip these steps by right-clicking on a text file or database table and selecting “Create Profile…”. This will take you directly to the next step. You may need to assign metadata to your file to describe how it is formatted. If there is no metadata associated with the file, the following screen will appear.

If the format shown is not correct, click on the Advanced… button to customize the format and appearance of your data. Once the appropriate metadata has been assigned, the profile configuration step will appear.

Page 8: Gettingstarted Guide

Getting Started with Ataccama DQ Analyzer 7 pg. 8

This panel allows you to configure the profile that you will create. You can specify where to create the profile as well as which columns to profile. Drill-through functionality allows you to see the individual records that comprise the statistics that are generated (database connection required). Finally, there is the option to create a Profile or a Plan file. Selecting the Profile option will generate the Profile immediately using the settings specified. Creating a Plan file will create a Plan that can be run to generate a Profile (see Chapter 5 - Working with Plan Files for more information on Plans). This option is useful if you wish to modify or filter the data before profiling it or if you want to do some advanced configuration of the profiling algorithm (such as adding business rules or performing primary key analysis, for example). If you select Profile and click Finish, the Profile will be generated and it will be opened in the Profile Viewer. See Chapter 4 - Reading a Profile to learn how to read the data contained in the Profile.

Page 9: Gettingstarted Guide

Getting Started with Ataccama DQ Analyzer 7 pg. 9

If you select the option to create a Plan file, a file will be created that looks something like this:

There are two steps in the Plan, one for reading the data and another to generate the Profile. You can double-click on either of them to perform additional configuration. To create the Profile, click the Run button in the toolbar.

If you want to modify the Profile that is created, you can double-click on the Profiling step to open the Profiling step editor. Here you can edit the existing configuration or add additional analyses to run.

The Profiling Step Editor

Double-clicking on the Profiling step will open the Profiling Step Editor, which can be used to configure the output sent to the generated profile file.

Page 10: Gettingstarted Guide

Getting Started with Ataccama DQ Analyzer 7 pg. 10

There are two categories of settings in the Profiling Step Editor: General settings, which apply to all inputs; and Input-specific settings that apply separately to each input. To add a new input to the Profiling step, click the Add button above the category list. The adjacent Remove button can be used to delete an input.

General Category

Basic Tab

The Basic tab contains fields for specifying the step name, output file name and location and default locale for the generated files.

Masks Tab

The Masks tab contains the masks that have been defined and is where new masks can be created and edited. A mask is a way of showing the structure of the data rather than the content of the data. Codes are used instead of the actual characters in the data to describe these patterns. For example, the mask "W" can be used to represent a word (the number of letters required to make a word can be defined in the Profiling Step properties), while "L" is used to represent a letter. The codes and rules for the masks can be defined here.

Each mask configuration contains the following settings: Characters – this is the type (or types) of characters that should be masked. The pre-defined types are: [:all:], [:letter:], [:lowercase:], [:uppercase:], [:digit:], and [:white:], where [:white:]

Page 11: Gettingstarted Guide

Getting Started with Ataccama DQ Analyzer 7 pg. 11

means all whitespace characters (i.e., spaces). Multiple characters can be used in the same mask (e.g., [:digit:][:white:]). Characters that aren’t masked will be shown as they are in the data when the Copy others checkbox is checked. For example, if only numbers are masked, the Mask results could show “# main street” in an address field, for example, where # is a numerical mask for a sequence of digits. Symbol – The symbol that is used to represent a single Character. “L” for letter, for example. Repeated Symbol – This is the character that is used to represent a sequence of characters defined in the Characters field. “W” for a sequence of more than 2 letters, for example. Repeated Threshold – The number of Characters that represents a sequence. For example, the number of letters required to make a word. The Repeated Symbol and Repeated Threshold fields may to left blank to mask all characters individually regardless of the sequence length.

To create a new mask, click the + button at the bottom of the list of masks. To delete a mask, click the – button. A mask may contain multiple masking rules (called character groups), as shown above. To add a new character group, click the Add button. Drill-through Tab

The Drill-through tab specifies whether drill-through functionality should be used or not. Enabling drill-through allows the ability to inspect the individual records that comprise the generated statistics and other measures shown in the Profile viewer. Because it requires storing additional fields, it requires a database connection for use. To allow drill-through in

Page 12: Gettingstarted Guide

Getting Started with Ataccama DQ Analyzer 7 pg. 12

the generated profile, click the Enable Drill-through checkbox. Then specify the database name and, optionally, a table prefix. For a list of available databases, use the content-assist functionality (invoked by ctrl-space) inside the Database Name field.

When drill-through is enabled, it can be used by right-clicking on many of the statistics shown in the profile viewer (e.g. Median value) and selecting Drill-through. Foreign Keys tab

Foreign Key analysis can be performed when there are two or more different inputs connected to the same Profiling step. Create a new Foreign Key analysis by clicking the + button at the bottom of the list, similar to creating a new Mask analysis. Then enter the names of the inputs to analyze in the Left Input Name and Right Input Name fields. Content-assist (ctrl-space) can be used to list the inputs of the step. Then use the Components fields to enter the column(s) from each input to analyze.

Input Category

The Input category contains settings that are specific to each Profiling step input. To create a new input, click the Add Input button in the upper-left corner of the dialog. This will add a new connection point to the step icon in the Plan editor so that a new input (e.g. Text File Reader or JDBC Reader step) can be connected.

Data Tab

The Data tab shows all the data that will be profiled. It also contains the individual settings for each column to be profiled. By default the settings that are defined in the Create Profile

Page 13: Gettingstarted Guide

Getting Started with Ataccama DQ Analyzer 7 pg. 13

wizard area applied to all columns, but this tab allows configuration of each column separately.

When connected to an incoming step (such as Text File Reader), the Fill Columns… button can be used to automatically add columns that are connected to the input, rather than manually typing the incoming column names. The Type column uses the types defined in the originating input step (e.g. Text File Reader) and is for reference only. To change the data type of a column, use the Metadata Editor or input step. Dependencies Tab

The Dependencies tab can be used to define an analysis to test the dependency of the fields in one column on the fields in other columns, such as whether birth number is related in some way to birth date. For more details on dependency analysis, see the Help page for the Profiling step.

Roll Ups Tab

A Roll Up is a way to look at a specific subset of the profiled data. Entering a column name in the Roll Ups Expression field will create a list of separate profile analyses for each value of that column. For example, if the data contains a column of countries, creating a roll up of the country column will allow viewing profile results for each country listed in that column. When a roll up is defined, the Inputs and Roll Ups panel will be shown in the Profile Viewer.

Business Rules Tab

A business rule is a Boolean expression that is evaluated and its results are presented in the Profile Viewer. Some examples are shown below.

Page 14: Gettingstarted Guide

Getting Started with Ataccama DQ Analyzer 7 pg. 14

See the Expressions Help page for a description of available expressions and their usage. Primary Keys Tab

To analyze the uniqueness of a particular column and determine whether it is a primary key, add a new primary key analysis and enter the name of the column to analyze in the Expression field of the Components section.

4. Reading a Profile

Introduction

The Profile Viewer contains several tabs and windows, which are described below. The data can be exported to XML or HTML format by using the Export… button above each table. For information on how to create a profile, see Chapter 3 - Creating a Profile.

Page 15: Gettingstarted Guide

Getting Started with Ataccama DQ Analyzer 7 pg. 15

Column Analyses

The Column analyses tab presents statistical analyses and pattern information about the columns that have been profiled. Each column in the input data is listed as a row in the table, which presents information such as data type, value counts, and minimum/maximum values.

Basic Analyses

The Basic tab provides simple statistics about the data that has been profiled and shows a chart of duplicate and distinct data as a percentage of the whole.

Page 16: Gettingstarted Guide

Getting Started with Ataccama DQ Analyzer 7 pg. 16

Interpreting Counts

The Counts table lists the following values:

Null: all data that are empty or have "Null" as their value.

Non-null: all data that are not empty or null (duplicate + distinct)

Duplicate: the number of values that are the same as other values in the list

Distinct: the number of non-null values that are different from each other (non-unique + unique)

Non-unique: the number of values that have at least one duplicate in the list

Unique: the number of values that have no duplicates

To illustrate the meaning of these values, take the following data as an example.

Record No. Value

1 John Smith

2 John Smith

3 Rebecca Davis

4 Paul Adams

5

The Counts table for this data would be as follows:

Type Count Records Explanation

Null 1 Record 5 The last record is empty

Non-null 4 Records 1-4 The first 4 records contain data

Duplicate 1 Record 2 there is one duplicate of the John Smith record

Page 17: Gettingstarted Guide

Getting Started with Ataccama DQ Analyzer 7 pg. 17

Distinct 3 Records 1, 3, 4

Non-unique 1 Record 1 John Smith has a duplicate record - therefore it isn't unique

Unique 2 Records 3 and 4 Rebecca Davis and Paul Adams appear only once in the list, they have no duplicates

Frequency

The Frequency Analysis tab shows the number of times each value in the data occurs (shown as both an absolute count and as a percentage of the whole).

Domain Analysis

This is an analysis to determine the likely type of the data in each column (whether the data is text, a number, or a date, for example). The probable types are listed, along with exceptions (such as a text string found in a list of dates, for example).

Page 18: Gettingstarted Guide

Getting Started with Ataccama DQ Analyzer 7 pg. 18

Mask Analysis

The Mask Analysis tab shows the syntactic patterns of the data, i.e. the structure of the data rather than the content of the data. Codes (“masks”) are used to describe these patterns. For example, the code “W” is used by default to represent a word (the number of letters required to make a word can be defined in the Profiling Step properties), while “L” is used to represent a letter. This type of analysis can be useful when, for example, looking at a column of names, where one or two words are common, but single letters and numbers are not. Finding unexpected patterns in the data can provide information about the overall level of quality of the data.

Quantiles

The Quantiles tab displays the data values that occur at designated intervals in the ordered data set. The first value in the list is at 0% and the last value is at 100%. The median value is at the 50% marker.

Page 19: Gettingstarted Guide

Getting Started with Ataccama DQ Analyzer 7 pg. 19

Group Frequency Analysis

The Groups tab presents a different analysis of the data in the Frequency tab. It shows the number of times that each non-null frequency count is repeated. If all values are unique, the group size will be 1, as there are no duplicate values. Each time a value is repeated, it forms a new group.

Inputs and Roll Ups

The Profiling Step may take any number of inputs, which are shown in this panel (if there is more than one input). Additionally, each input may have any number of “roll ups” assigned to it - ways of grouping the data by specific parameters. For example, roll ups could be used to view data profiles by gender or country.

Primary Keys

When configured in the Profiling Step properties, the Primary Keys tab is shown and analyzes the uniqueness of designated keys.

Foreign Keys

When configured in the Profiling Step properties, the Foreign Keys tab is shown and analyzes whether the key from one input can be considered a foreign key in relation to the other (parent) entity coming from a second input

Business Rules

When configured in the Profiling Step properties, this tab is shown and displays the results of the evaluation of any number of Boolean expressions relating to the input data.

Page 20: Gettingstarted Guide

Getting Started with Ataccama DQ Analyzer 7 pg. 20

The example below (taken from the Advanced Profiling sample) shows a business rule that checks the length of each SIN number and tests whether it is 9 digits in length. It evaluates to true if the length is 9 digits and false otherwise.

Dependency Analysis

Dependency Analysis discovers whether values of Dependants (selected columns or expressions) depend on the value of a Determinant (one or more columns combined into a single key). Each group of records with the same Determinant value is examined, and if the most frequent Dependant value is present in at least a certain percent of records, the whole group is considered to be dependent. Otherwise the whole group is considered not to be dependent.

5. Working with Plan Files

A Plan file defines the logic and rules to be applied to the input data in order to produce the desired output. Plans are created by placing Steps onto a canvas and connecting them together. Steps are data processing algorithms that can be used to read, transform and analyze data, among other actions.

To create a new Plan file, select New → Plan by right-clicking on a project or folder in the explorer panel (or from the File menu or Toolbar).

Page 21: Gettingstarted Guide

Getting Started with Ataccama DQ Analyzer 7 pg. 21

The Plan Editor

The Plan editor consists of a canvas where the Plan logic is defined (by connecting Steps together) and a palette where the various Steps are listed.

Building a Plan

To start building a Plan, drag a Step from the palette and drop it on to the canvas.

Page 22: Gettingstarted Guide

Getting Started with Ataccama DQ Analyzer 7 pg. 22

Connect Steps together by dragging from the "out" endpoint of one Step to the "in" endpoint of another or by selecting the “Connection” object from the palette and select the output from one Step and then the input from another Step.

Editing Steps

Properties for each Step can be edited by double-clicking on the Step, or by right-clicking the Step and selecting Edit Properties....

Page 23: Gettingstarted Guide

Getting Started with Ataccama DQ Analyzer 7 pg. 23

For more information on the available Steps in DQ Analyzer, refer to the documentation pages for each Step.

Comments

Comments can be used to place notes or other information on the canvas or to place around a series of Steps to visually group them together.

Page 24: Gettingstarted Guide

Getting Started with Ataccama DQ Analyzer 7 pg. 24

Working with Data Files

Existing files can be added to DQ Analyzer for use as input data for a Plan, for example. Files can be added by dragging and dropping from the file system to the desired project in the Navigator panel or by copying them from destination folder to the desired project folder inside the workspace folder in the file system.

To use an input file in a Plan, it must first be assigned metadata describing the format of the data. When a data file (e.g. .txt or .csv) file is opened for the first time the Metadata

Editor is launched.

The metadata editor presents options for how to read the file, such as the type of delimiter used, the data types of each column, and whether the file contains header rows. The result data can be previewed in the lower panel of the editor to examine the results of the metadata settings. Clicking OK in the Metadata Editor will open the data file for viewing.

The file metadata can be edited later by right-clicking on the file and selecting Edit Metadata....

Viewing Data Files

Double-clicking a text (e.g. .txt or .csv) file will open it for viewing in the CSV Viewer. The CSV Viewer displays the data in rows and columns, as defined in the file metadata.

Page 25: Gettingstarted Guide

Getting Started with Ataccama DQ Analyzer 7 pg. 25

Sorting

To sort a column, click the name of the column in the header row. Clicking once will sort the data in ascending order (i.e. smallest-to-largest/A-to-Z), indicated by an up arrow. Clicking again will sort the data in descending order, indicated by a down arrow. Clicking a third time will remove all sorting and revert to the original ordering of the data, indicated by no arrow.

Filtering

To show only a subset of the data, click the View Settings button in the toolbar.

This will open the View Settings dialog, which contains a tab named Filter.

Page 26: Gettingstarted Guide

Getting Started with Ataccama DQ Analyzer 7 pg. 26

To define a filter, click the Add button. Use the drop-down controls to select a column to filter and a condition to apply (e.g. =, <, contains). Then specify the matching criteria. It is also possible to specify whether the filter should be case sensitive or not. This will display only rows matching the filter criteria. Multiple filters can be defined to further refine the data that is shown. To remove a filter without deleting it, uncheck the Apply checkbox. Data Coloring

By default, all data will be shown in black text on a white background using the default font settings. However, rules can be configured so that certain data values or ranges are colored or formatted differently. This can be useful for visually scanning for key values in a large data file, for example. The conditions are defined similarly to Filters, but there are additional options for coloring and text formatting (available via the Coloring button).

Page 27: Gettingstarted Guide

Getting Started with Ataccama DQ Analyzer 7 pg. 27

Additionally, there is the option to define whether the coloring rule should be applied only to the specific cell which matches the coloring rule or to all (or any subset) of the columns.

Coloring rules will be applied in the order in which they appear. The order can be changed using the buttons on the right, below the Add button. Column Visibility

For data files with many columns it may be useful to hide certain columns to focus on specific data. This can be done in the Columns tab of the View Settings dialog. Uncheck a column to hide it from view. When columns are hidden, a note appears at the bottom of the CSV Viewer indicating the number of columns that are hidden and providing a quick link to show them all.

Page 28: Gettingstarted Guide

Getting Started with Ataccama DQ Analyzer 7 pg. 28

A column can also be hidden by right-clicking on it and selecting Hide Column. Color-coding Column Headers

Many data files use standardized naming conventions to group similar columns. The View Settings dialog allows specifying different colors for column headers based on all or part of the column name. In the Heading tab, a column mask can be defined (e.g. “src*”), which will color all headings whose name starts with the text specified. A different background color can be specified for each mask that is used.

Resizing Columns

Column widths can be resized by dragging left or right when the mouse is placed over the column divider in the column header. Columns can be automatically sized to fit their contents by double-clicking the column divider.

Page 29: Gettingstarted Guide

Getting Started with Ataccama DQ Analyzer 7 pg. 29

Additionally, right-clicking on a column header will bring up a menu which offers, among other functions, the ability to “autofit” the selected column or all columns. Mark Groups

Also available in the column header context menu is the ability to visually mark changes in data. This can be useful for visually scanning a specific column to look for changes in the data, for example.

The column whose groups are marked is indicated by an icon showing three parallel horizontal lines next to the column name (as shown above). Saving Views

To preserve the view settings (including sorting, column widths, and marked groups) for later use, use the View Settings drop-down arrow to open the options menu.

Page 30: Gettingstarted Guide

Getting Started with Ataccama DQ Analyzer 7 pg. 30

Use the Save As… (for creating a new saved view) or Save (for saving changes to an existing view) options to store view settings. A list of recently used views will be shown at the top of the menu. A complete list of all saved views is shown in a submenu at the bottom labeled All Saved Views. An asterisk (*) next to the view name indicates that unsaved changes have been made to the current view. To delete a view, select the Manage Views… option. It is also possible to import and export views for use with other copies of the product using this menu option. The view called Default is a stored view with no settings applied. It cannot be changed or deleted. When changes have been made to the default view, the toolbar button label changes to <custom>, which indicates an unsaved view based on the default view. Click the Save As… option to name and store the new view. The Edit… option is the same as clicking the toolbar button, which opens the View Settings dialog.

Using Data Files in a Plan

To use input files inside a Plan, add a Text File Reader Step and enter the input file name in the File Name property.

Alternatively, text files can be dragged from the explorer panel directly on to the canvas, where a Text File Reader will be generated after the metadata is created.

Page 31: Gettingstarted Guide

Getting Started with Ataccama DQ Analyzer 7 pg. 31

Tips for Using Steps

DQ Analyzer offers several steps and functions for constructing plan files. The algorithms and logic used to create a plan file will vary from project to project; an introduction to steps and functions is provided below.

Description of Steps

Steps can perform many types of functions, such as transforming data, filtering and categorizing data and reading data. Below is an overview of the steps included with DQ Analyzer.

Icon Step Name Step Description

Column Assigner Assigns the result of an expression to a column.

Condition Directs data flow (True->right false->left).

Text File Reader Reads data from a text file.

Fixed Width File Reader Reads data from a fixed-width delimited text file.

JDBC Reader Reads data from a JDBC (database) data source.

Page 32: Gettingstarted Guide

Getting Started with Ataccama DQ Analyzer 7 pg. 32

Excel File Reader Reads files created by Microsoft Excel.

Profiling Comprehensive analysis written to a file (.profile).

Regex Matching Parses the input string based upon regular expression capturing groups.

Splitter Creates a new record for each word in a defined expression.

Trash Discards data flow.

Union same Like SQL Table union but applies only if flows are exactly same.

A complete description of the Steps and their usage can be found in the product Help (Help → Help Contents in the toolbar menu).

Using Functions

There are many functions available in DQ Analyzer that can be used inside Steps. Some of the common functions are listed below.

Function Description Return Value(s)

matches Full match – input data with regular expression True/false

find Partial match – regular expression in input string True/false

substr Get substring of input string. Starting with zero. String

Using Regular Expressions

DQ Analyzer supports the use of regular expressions for pattern matching. Some of the basic regular expressions are listed below.

Regular Expression Matches

\d Number

[A-Z] Uppercase letter

[a-z] Lowercase letter

\s Whitespace

. (dot) Any character

? Once or none

+ Once or more

* Zero or more times

{2,6} At least 2 times, maximum 6 times

^ Beginning of string

$ End of string

For example, two regular expressions and their uses are shown below.

Page 33: Gettingstarted Guide

Getting Started with Ataccama DQ Analyzer 7 pg. 33

Regular Expression String Sample Usage

[A-Z] [0-9] [A-Z]\s?[0-9] [A-Z][0-9] Canadian ZIP code (e.g., A3A 9S9)

(\d{3} \d{2} \d{4}|\d{9}|\d{3}\-\d{2}\-\d{4}) US Social Security Number (123 45 789 or 123456789 or 123-45-6789)

Debugging and Running a Plan

Errors in the Plan will be shown in the Properties panel as the Plan is constructed.

Selecting an individual Step will show only the warnings and errors for that Step. Double-clicking on an error in the Properties panel will open the Step properties dialog to the field which contains the error. Individual Steps can also be debugged along the way by clicking the Debug button in the toolbar when a Step is selected or by right-clicking on a Step and selecting Debug.

To run a Plan, click on the Run button on the toolbar or right-click on the canvas and select Run from the context menu.

Page 34: Gettingstarted Guide

Getting Started with Ataccama DQ Analyzer 7 pg. 34

When a Plan is being run its progress can be monitored in the Console window below the Plan editor.

A more detailed view of the plan execution progress will be shown when you click the Show Progress button in the Console window.

The execution progress will be shown inside the running plan.

6. Connecting to a Database

The following JDBC database drivers are included with DQ Analyzer (additional drivers can be added in the DB Drivers preferences):

Apache Derby HSQLDB Oracle Microsoft SQL PostgreSQL

Page 35: Gettingstarted Guide

Getting Started with Ataccama DQ Analyzer 7 pg. 35

To connect to one of these database types, right-click on the Databases node in the DQ Explorer and select New → Database Connection.

Selecting a driver name will populate the Connection Parameters section with the fields relevant to the selected database. When more advanced settings are necessary, an experienced user may also use the JDBC URL for connecting to the database.

After the database connection has been made, the database will be shown in the Databases node in the explorer panel. Clicking on the table names will show metadata for each table in the Properties panel. To view the results of an SQL query on a table, right-click on a table and select Open in SQL editor.

Page 36: Gettingstarted Guide

Getting Started with Ataccama DQ Analyzer 7 pg. 36

A default query will be shown, listing all table entries (grouped in batches if the number of rows is large). To change the query, edit the query text and click the Execute button. To retrieve more results from the query, click Next batch or Read rest (to show all results).

Refer to the documentation for the JDBC Reader Step to learn how to use data from a database inside a Plan file.


Top Related