data profiling-best-practices

12
White Paper Data Profiling Best Practices Data Profiling Best Practices

Upload: blaise-cheuteu

Post on 08-Jan-2017

491 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Data profiling-best-practices

White Paper

Data Profiling Best Practices

Data

Profiling

Best

Practices

Page 2: Data profiling-best-practices

2

OverviewThis white paper provides an overview of best practices with data

– Examines the best scenarios for

Why Use Data Profiling Technologies?

Deployment of Data Profiling Technologies

Data Quality Management

>>

>>

>>

Page 3: Data profiling-best-practices

White Paper

Data Profiling Best Practices

3

>>

>>

>>

>>

Data Integration

>>

>>

>>

>>

>>

Page 4: Data profiling-best-practices

4

Data Profiling Process

Prepare for the Project

>>

>>

>>

>>

>>>>>>>>>>>>>>>>>>>>>>

Analysis Preparation

Review Project Initiation Document

Page 5: Data profiling-best-practices

White Paper

Data Profiling Best Practices

5

Current Documentation

>>>>>>

Team Training

Internal Setup/Decisions

>>

>>

Profiling Overview

PROJ

ECT

PREP

ARAT

ION

ANAL

YSIS

PREP

ARAT

ION

ANAL

YSIS

SAM

PLIN

GEX

TRAC

T&

FOR

MAT

Project Initiation Document

Project Preparation

Extract & Format

Analyze Samples

Profiling

Page 6: Data profiling-best-practices

6

Activity Workflow

>>

>>

>>

>>

>>

>>>>

Extract and Format the Data

>>

Page 7: Data profiling-best-practices

White Paper

Data Profiling Best Practices

7

>>

>>

Create the Extract Program(s)

Load Preparation

>>

>>

>>

Sampling

>>

>>

>>

>>

>>

>>

Load a Sample of the Data

Analysis of the Sample

Csv Each field, if separated by a comma, and text fields enclosed within quotes. Generally this type of file al-lows the first row to contain the name of the column.

csv FileDefinition

Some product require or allow you to create defini-tion rules for csv files. It is helpful to add or change column names or add descriptions to the attributes.

Flat FileDefinition

Varies based on the data profiling product chosen. It varies from a flattened copybook or equivalent for the language used, to pre-defined formats specific to the tool itself.

ODBCConnection

Open DataBase Connectivity, a standard database access method developed by Microsoft Corporation. The goal of ODBC is to access any data from any application, regardless of which database manage-ment system (DBMS) is handling the data.

Page 8: Data profiling-best-practices

8

Adjust the Extracts and Formats of the Data

>>

>>

>>

Produce Deliverables

Delete the Samples

Analysis

Analysis Assistant

>>

>>

>> Code

>>

>>

>>

Blanks/Nulls/Low Values/High Values

Page 9: Data profiling-best-practices

White Paper

Data Profiling Best Practices

9

Minimums/Maximums

Patterns

>>

>>

Duplicates / Inconsistencies

Invalid Codes

Identify Keys

Key Testing

Join Testing

Low Value

000-00-0000

NULL

High Value

999-99-9999

System

System 1

Minimum

000-00-00001

Maximum

System

System 1

System 1

System 2

Values

123-45-6789

12-3456789

123456789

Pattern

9(3)-9(2)-(4)

9(2)-(7)

9(9)

System

System 1

System 1

Values

123-45-6789

123-45-6789

System

System 1

System 1

Values

123-45-6789

123-45-6789

Page 10: Data profiling-best-practices

10

Outputs

Page 11: Data profiling-best-practices

White Paper

Data Profiling Best Practices

11

Page 12: Data profiling-best-practices

White Paper

Data Profiling Best Practices

For more information about our products and services, please log onto our website at www.g1.com or call us today at 888-413-6763.

4200 Parliament Place, Suite 600Lanham, MD 20706-18441-888-413-6763 • www.g1.com

Group 1, Group 1 Software and the Group 1 logo are registered trademarks of Group 1

Software, Inc. Pitney Bowes and the Pitney Bowes logo are registered trademarks and the

Pitney Bowes Process Bar Design is a trademark of Pitney Bowes Inc. Group 1 Software

is a Pitney Bowes company. All other marks referenced in this material are the property of

their respective owners.

© 2007 Group 1 Software, Inc. All rights reserved.

An Equal Opportunity Employer. Printed in U.S.A.