metadata melodies webinar with david loshin presentation

21
Harmonize or Differentiate? David Loshin Knowledge Integrity, Inc. www.knowledge-integrity.com 1 © 2014 Knowledge Integrity, Inc. www.knowledge-integrity.com (301)754-6350

Upload: embarcadero-technologies

Post on 25-Dec-2014

96 views

Category:

Software


0 download

DESCRIPTION

See the companion webinar at: http://embt.co/1uHXmjv The ever-growing interest in data accumulation from multiple sources and organizations for reporting and analysis exposes a dirty secret: those business terms that we all think we understand actually have a wide variety of definitions. Sometimes these variances are largely irrelevant, and do not significantly impact the ability to create a reasonable report. However, there are some instances in which even minor variations in structure, content, or semantics can have a significant impact in delivering trustworthy results. This leads to the question: if we have two different structures or definitions for what appear to be two similar concepts, should we harmonize the definitions and structures into one? In some cases this will be a good idea, and it will lead to increased consistency, but this is only true as long as the two concepts really refer to the same real-world idea. In other cases, the same terms are used for two different ideas, necessitating a division into two or more qualified business terms and definitions.

TRANSCRIPT

Page 1: Metadata Melodies Webinar with David Loshin Presentation

Harmonize or Differentiate

David Loshin

Knowledge Integrity Inc

wwwknowledge-integritycom

1copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

Common Business Terms Are They Really Common

copy 2013 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

2

What is a ldquostaterdquo

Challenges in Semantic Consistency

Data definitions are often biased around specific business function requirements

The meaning of a concept may slightly differ from application to application

Consolidation without considering semantics will lead to confusion downstream

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

3

ldquoLocationrdquo

Municipal

Taxation

Mailing

Delivery

Utility

Evolution of the Business Metadata Glossary

Many sources of entity concepts and business terms may conflict with each other

The data steward must facilitate the collection and documentation of business terms

The data steward must also prepare for harmonization of terms

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

4

Policies

System Docs

Processes

Models

Standards

Applications

Business Rules

Profiling

Etc

Entity Concepts

BusinessTerms

DefinitionContextual

Meaning

hellip hellip

DefinitionContextual

Meaning

DefinitionContextual

Meaning

DefinitionContextual

Meaning

Example ndash Identifying Business Terms

Order ConfirmationIf you do not receive a confirmation number (in the form of a confirmation page or email) after submitting payment information or if you experience an error message or service interruption after submitting payment information it is your responsibility to confirm with FizzDizzleCustomer Service whether or not your order has been placed

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

5

bull Youbull Confirmation numberbull Confirmation pagebull Confirmation emailbull Payment informationbull Error messagebull Service interruptionbull FizzDizzle Customer Servicebull Order

Nouns

Example ndash Identifying Business Terms

Order ConfirmationIf you do not receive a confirmation number (in the form of a confirmation page or email) after submitting payment information or if you experience an error message or service interruption after submitting payment information it is your responsibility to confirm with FizzDizzleCustomer Service whether or not your order has been placed

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

6

bull Receivebull Submittingbull Experiencebull Confirmbull Placed

Verbs

Entities amp Characteristics

Entities are core concepts that are mapped to conceptual data domain models such as

Customer Organization Order Product

The conceptual Data Domain is mapped to a container such as

File table object

Characteristics are attributes of entities modeled as data element concepts

Data element concepts are mapped to data elements

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

7

Characteristics

A data element concept has values taken from a conceptual domain

One data element concept might be mapped to more than one instantiation as a data element

A data element has values taken from a value domain

A conceptual domain might be mapped to more than one instantiation as a value domain

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

8

More on Entities amp Characteristics

An entity may have many characteristics

A data domain may refer to many data element concepts

An instantiated container may have many data elements

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

9

Enumeration of Data Elements

Each data element concept is manifested as one or more data elements in a specific system

The template is used to map data element concepts to used data elements

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

10

Data Element Identifier

NameData

Element Concept

UsageConceptual

DomainValue

DomainStorage

Data TypePresentation

Data TypeUnit of

MeasureBusiness

Rules

20988Customer State State Salesforcecom CST VST-1 Char(2) Char(2) NA

May not be null

Business Terms and Data Element Concepts

Map use of a business term to a definition then to the entity or characteristic

Customer is used in reference to the customer entity

Account Number is used in reference to an attribute of a customer entity

Need to track list of data element concepts

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

11

Concept ID Concept Business Term Definition ID

16-A334License or Permit Holder Licensee BT-977

16-A334License or Permit Holder Permit Holder BT-983

Data Harmonization

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

12

Identify inconsistent definitions conflicts

in data domains format variations

Extract data definitions from current guidance documents etc and categorize definitions

using standard terminology

Integrate data elements into a single reference

source then combine common

data elements

Identify authoritative source

for definitionsAssign names using Naming Convention

Document in metadata Registry

Identify AnomaliesResolve and Standardize

Integrate amp CollateExtract amp Collect

Regu

latio

ns

Po

licie

s

Documents

Forms

Extract amp Collect

Collected metadata includes

Assigned identifier

Data element name

Related business terms

Definition

Data type

Length

Business rules

IssuesComments

Reference domain

Standard name

Authoritative sources

Lineage

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

13

Data Element

Type

FirstName VARCHAR(35)

LastName VARCHAR(40)

SSN CHAR(11)

Telephone VARCHAR(20)

Data Element

Type

First VARCHAR(25)

Middle VARCHAR(25)

Last VARCHAR(30)

SocialSec CHAR(9)

Understanding Reference Data - Assessment

copy 2013 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

14

Jean Montard 0 062672

Michael Evans 0 112168

Fran Peterson 1 030276

Pat Lawson 1 041779

J Montard M 062672

M Evans M 112168

F Peterson F 030276

P Lawson F 041779

bull Each of these data sets have matching records for unique individuals

bull Each has a code value with a 111 correspondence

bull Understand why the values differ in each data set

Jean J Montard F 062672

Michael D Evans F 112168

Fran S Peterson M 030276

Pat O Lawson M 041779

15

Integrate amp Collate

CORP-

QWODR-25

Quarterly Wage

Employee Wage

Amount

11 AN This field will contain the

information as provided from

the Quarterly Wage record

submitted for State Filing

PROPOSED State

Corporate

Quarterly Wage

OUTPUT DETAIL

RECORD

EMPR-385 WAGE AMOUNT The amount of a

personrsquos wages

during a

Reporting

Quarter

11 Signed

Numeric

00000000000 through

99999999999 The last two

positions are implied to be to

the right of the decimal point

Conditional for the following

output record Federal

Employee Locate Response

Record

Federal Match

System

ID Data Element

Name

Definition L T Data Element Rules IssuesCom

ments

Mapping Authoritative

Source

CORP-

QWODR-25

Quarterly Wage

Employee Wage

Amount

11 AN This field will contain the

information as provided from

the Quarterly Wage record

submitted for State Filing

PROPOSED State

Corporate

Quarterly Wage

OUTPUT DETAIL

RECORD

EMPR-385 WAGE AMOUNT The amount of a

personrsquos wages

during a

Reporting

Quarter

11 Signed

Numeric

00000000000 through

99999999999 The last two

positions are implied to be to

the right of the decimal point

Conditional for the following

output record Federal

Employee Locate Response

Record

Federal Match

System

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

Identify Anomalies

Inconsistency or ambiguity for similarly-named data elements

Inconsistency of explicit data element business rules

Incomplete or inconsistent reference value domains

Inconsistent formats

Conflicting data types

Abbreviations vs full names

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

16

ID Data Element

Name

Definition L T Data Element Rules Mapping

SWA-UI-

OD-15

Claimant State Lacking definition 2 AN If present this field will contain the Claimant

State code as provided on the submitted UI

record

State UI Output Detail

Record

IRS-DME-

15

Sex Lacking definition 1 AN Sex Code from the Person Table will contain

spaces if not present on the Person Table

Values not specified

IRS DATA MATCH

EXTRACT RECORD

FCR-69 Benefit Amount The monetary amount of Unemployment

Insurance benefits a person received during

a Reporting Period This definition

does not specify the Reporting

Period SWA specify quarter

[see below]

11 AN 00000000000 through 99999999999 The last

two positions are implied to be to

the right of the decimal point This field

will contain all zeroes when there is no Benefit

Amount or the information is not available

Conditional for the following output record bull

Federal Match Record

SWA-OD-

19

Benefit Amount This field will contain the gross amount of UI

benefits prior to any deductions paid to a

claimant during the reporting quarter as

provided on the UI record submitted to the

NDNH

11 AN Values are 00000000000 through 99999999999

without decimal

This field is whole dollars only

Potential conflict with FCR-69

SWA UI Output Detail

Record

Resolve amp Standardize

Identify authoritative

sources

Prioritize potential

harmonized definitions

Review with subject matter experts

Consolidate if possible

Differentiate if necessary

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

17

Isomorphic Domains

We can say that value domains A and B are isomorphic if

The cardinality of A is equal to the cardinality of B (they have the same number of values)

Both A and B are associated with a conceptual domain C with the same cardinality as A and B

There is an enumeration of value meanings in A that is a 1-1 mapping to concepts in C

There is an enumeration of value meanings in B that is a 1-1 mapping to concepts in C

In other words if two value domains have a one-to-one mapping to the same conceptual domain values they are isomorphic

Isomorphic domains can be harmonized

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

18

Domain Congruence

Two value domains are congruent if their value sets intersect and the size of the intersection meets or exceeds a predefined threshold

Example

FIPS 2-Character State Codes contain values for all US States

USPS 2-Character State Codes contain values for all US States as well as AA AE and AP postal codes for military delivery

Under certain circumstances the two domains can be harmonized

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

19

Encouraging a Culture of Semantic Harmony

Small variance in definitions in isolated functions become magnified when data is shared across functions

Establish a level playing ground by

Instituting a common business term glossary

Harmonizing business term definitions

Unifying shared reference data into conceptual domains and corresponding value domains

Socializing use of shared metadata

Establishing standards for future development

Integrate methods for monitoring compliance with standards

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

20

Check Out These Resources

wwwknowledge-integritycom

wwwdataqualitybookcom

If you have questions comments or suggestions please contact me

David Loshin

301-754-6350

loshinknowledge-integritycom

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301) 754-6350

21

Page 2: Metadata Melodies Webinar with David Loshin Presentation

Common Business Terms Are They Really Common

copy 2013 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

2

What is a ldquostaterdquo

Challenges in Semantic Consistency

Data definitions are often biased around specific business function requirements

The meaning of a concept may slightly differ from application to application

Consolidation without considering semantics will lead to confusion downstream

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

3

ldquoLocationrdquo

Municipal

Taxation

Mailing

Delivery

Utility

Evolution of the Business Metadata Glossary

Many sources of entity concepts and business terms may conflict with each other

The data steward must facilitate the collection and documentation of business terms

The data steward must also prepare for harmonization of terms

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

4

Policies

System Docs

Processes

Models

Standards

Applications

Business Rules

Profiling

Etc

Entity Concepts

BusinessTerms

DefinitionContextual

Meaning

hellip hellip

DefinitionContextual

Meaning

DefinitionContextual

Meaning

DefinitionContextual

Meaning

Example ndash Identifying Business Terms

Order ConfirmationIf you do not receive a confirmation number (in the form of a confirmation page or email) after submitting payment information or if you experience an error message or service interruption after submitting payment information it is your responsibility to confirm with FizzDizzleCustomer Service whether or not your order has been placed

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

5

bull Youbull Confirmation numberbull Confirmation pagebull Confirmation emailbull Payment informationbull Error messagebull Service interruptionbull FizzDizzle Customer Servicebull Order

Nouns

Example ndash Identifying Business Terms

Order ConfirmationIf you do not receive a confirmation number (in the form of a confirmation page or email) after submitting payment information or if you experience an error message or service interruption after submitting payment information it is your responsibility to confirm with FizzDizzleCustomer Service whether or not your order has been placed

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

6

bull Receivebull Submittingbull Experiencebull Confirmbull Placed

Verbs

Entities amp Characteristics

Entities are core concepts that are mapped to conceptual data domain models such as

Customer Organization Order Product

The conceptual Data Domain is mapped to a container such as

File table object

Characteristics are attributes of entities modeled as data element concepts

Data element concepts are mapped to data elements

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

7

Characteristics

A data element concept has values taken from a conceptual domain

One data element concept might be mapped to more than one instantiation as a data element

A data element has values taken from a value domain

A conceptual domain might be mapped to more than one instantiation as a value domain

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

8

More on Entities amp Characteristics

An entity may have many characteristics

A data domain may refer to many data element concepts

An instantiated container may have many data elements

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

9

Enumeration of Data Elements

Each data element concept is manifested as one or more data elements in a specific system

The template is used to map data element concepts to used data elements

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

10

Data Element Identifier

NameData

Element Concept

UsageConceptual

DomainValue

DomainStorage

Data TypePresentation

Data TypeUnit of

MeasureBusiness

Rules

20988Customer State State Salesforcecom CST VST-1 Char(2) Char(2) NA

May not be null

Business Terms and Data Element Concepts

Map use of a business term to a definition then to the entity or characteristic

Customer is used in reference to the customer entity

Account Number is used in reference to an attribute of a customer entity

Need to track list of data element concepts

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

11

Concept ID Concept Business Term Definition ID

16-A334License or Permit Holder Licensee BT-977

16-A334License or Permit Holder Permit Holder BT-983

Data Harmonization

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

12

Identify inconsistent definitions conflicts

in data domains format variations

Extract data definitions from current guidance documents etc and categorize definitions

using standard terminology

Integrate data elements into a single reference

source then combine common

data elements

Identify authoritative source

for definitionsAssign names using Naming Convention

Document in metadata Registry

Identify AnomaliesResolve and Standardize

Integrate amp CollateExtract amp Collect

Regu

latio

ns

Po

licie

s

Documents

Forms

Extract amp Collect

Collected metadata includes

Assigned identifier

Data element name

Related business terms

Definition

Data type

Length

Business rules

IssuesComments

Reference domain

Standard name

Authoritative sources

Lineage

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

13

Data Element

Type

FirstName VARCHAR(35)

LastName VARCHAR(40)

SSN CHAR(11)

Telephone VARCHAR(20)

Data Element

Type

First VARCHAR(25)

Middle VARCHAR(25)

Last VARCHAR(30)

SocialSec CHAR(9)

Understanding Reference Data - Assessment

copy 2013 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

14

Jean Montard 0 062672

Michael Evans 0 112168

Fran Peterson 1 030276

Pat Lawson 1 041779

J Montard M 062672

M Evans M 112168

F Peterson F 030276

P Lawson F 041779

bull Each of these data sets have matching records for unique individuals

bull Each has a code value with a 111 correspondence

bull Understand why the values differ in each data set

Jean J Montard F 062672

Michael D Evans F 112168

Fran S Peterson M 030276

Pat O Lawson M 041779

15

Integrate amp Collate

CORP-

QWODR-25

Quarterly Wage

Employee Wage

Amount

11 AN This field will contain the

information as provided from

the Quarterly Wage record

submitted for State Filing

PROPOSED State

Corporate

Quarterly Wage

OUTPUT DETAIL

RECORD

EMPR-385 WAGE AMOUNT The amount of a

personrsquos wages

during a

Reporting

Quarter

11 Signed

Numeric

00000000000 through

99999999999 The last two

positions are implied to be to

the right of the decimal point

Conditional for the following

output record Federal

Employee Locate Response

Record

Federal Match

System

ID Data Element

Name

Definition L T Data Element Rules IssuesCom

ments

Mapping Authoritative

Source

CORP-

QWODR-25

Quarterly Wage

Employee Wage

Amount

11 AN This field will contain the

information as provided from

the Quarterly Wage record

submitted for State Filing

PROPOSED State

Corporate

Quarterly Wage

OUTPUT DETAIL

RECORD

EMPR-385 WAGE AMOUNT The amount of a

personrsquos wages

during a

Reporting

Quarter

11 Signed

Numeric

00000000000 through

99999999999 The last two

positions are implied to be to

the right of the decimal point

Conditional for the following

output record Federal

Employee Locate Response

Record

Federal Match

System

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

Identify Anomalies

Inconsistency or ambiguity for similarly-named data elements

Inconsistency of explicit data element business rules

Incomplete or inconsistent reference value domains

Inconsistent formats

Conflicting data types

Abbreviations vs full names

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

16

ID Data Element

Name

Definition L T Data Element Rules Mapping

SWA-UI-

OD-15

Claimant State Lacking definition 2 AN If present this field will contain the Claimant

State code as provided on the submitted UI

record

State UI Output Detail

Record

IRS-DME-

15

Sex Lacking definition 1 AN Sex Code from the Person Table will contain

spaces if not present on the Person Table

Values not specified

IRS DATA MATCH

EXTRACT RECORD

FCR-69 Benefit Amount The monetary amount of Unemployment

Insurance benefits a person received during

a Reporting Period This definition

does not specify the Reporting

Period SWA specify quarter

[see below]

11 AN 00000000000 through 99999999999 The last

two positions are implied to be to

the right of the decimal point This field

will contain all zeroes when there is no Benefit

Amount or the information is not available

Conditional for the following output record bull

Federal Match Record

SWA-OD-

19

Benefit Amount This field will contain the gross amount of UI

benefits prior to any deductions paid to a

claimant during the reporting quarter as

provided on the UI record submitted to the

NDNH

11 AN Values are 00000000000 through 99999999999

without decimal

This field is whole dollars only

Potential conflict with FCR-69

SWA UI Output Detail

Record

Resolve amp Standardize

Identify authoritative

sources

Prioritize potential

harmonized definitions

Review with subject matter experts

Consolidate if possible

Differentiate if necessary

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

17

Isomorphic Domains

We can say that value domains A and B are isomorphic if

The cardinality of A is equal to the cardinality of B (they have the same number of values)

Both A and B are associated with a conceptual domain C with the same cardinality as A and B

There is an enumeration of value meanings in A that is a 1-1 mapping to concepts in C

There is an enumeration of value meanings in B that is a 1-1 mapping to concepts in C

In other words if two value domains have a one-to-one mapping to the same conceptual domain values they are isomorphic

Isomorphic domains can be harmonized

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

18

Domain Congruence

Two value domains are congruent if their value sets intersect and the size of the intersection meets or exceeds a predefined threshold

Example

FIPS 2-Character State Codes contain values for all US States

USPS 2-Character State Codes contain values for all US States as well as AA AE and AP postal codes for military delivery

Under certain circumstances the two domains can be harmonized

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

19

Encouraging a Culture of Semantic Harmony

Small variance in definitions in isolated functions become magnified when data is shared across functions

Establish a level playing ground by

Instituting a common business term glossary

Harmonizing business term definitions

Unifying shared reference data into conceptual domains and corresponding value domains

Socializing use of shared metadata

Establishing standards for future development

Integrate methods for monitoring compliance with standards

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

20

Check Out These Resources

wwwknowledge-integritycom

wwwdataqualitybookcom

If you have questions comments or suggestions please contact me

David Loshin

301-754-6350

loshinknowledge-integritycom

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301) 754-6350

21

Page 3: Metadata Melodies Webinar with David Loshin Presentation

Challenges in Semantic Consistency

Data definitions are often biased around specific business function requirements

The meaning of a concept may slightly differ from application to application

Consolidation without considering semantics will lead to confusion downstream

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

3

ldquoLocationrdquo

Municipal

Taxation

Mailing

Delivery

Utility

Evolution of the Business Metadata Glossary

Many sources of entity concepts and business terms may conflict with each other

The data steward must facilitate the collection and documentation of business terms

The data steward must also prepare for harmonization of terms

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

4

Policies

System Docs

Processes

Models

Standards

Applications

Business Rules

Profiling

Etc

Entity Concepts

BusinessTerms

DefinitionContextual

Meaning

hellip hellip

DefinitionContextual

Meaning

DefinitionContextual

Meaning

DefinitionContextual

Meaning

Example ndash Identifying Business Terms

Order ConfirmationIf you do not receive a confirmation number (in the form of a confirmation page or email) after submitting payment information or if you experience an error message or service interruption after submitting payment information it is your responsibility to confirm with FizzDizzleCustomer Service whether or not your order has been placed

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

5

bull Youbull Confirmation numberbull Confirmation pagebull Confirmation emailbull Payment informationbull Error messagebull Service interruptionbull FizzDizzle Customer Servicebull Order

Nouns

Example ndash Identifying Business Terms

Order ConfirmationIf you do not receive a confirmation number (in the form of a confirmation page or email) after submitting payment information or if you experience an error message or service interruption after submitting payment information it is your responsibility to confirm with FizzDizzleCustomer Service whether or not your order has been placed

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

6

bull Receivebull Submittingbull Experiencebull Confirmbull Placed

Verbs

Entities amp Characteristics

Entities are core concepts that are mapped to conceptual data domain models such as

Customer Organization Order Product

The conceptual Data Domain is mapped to a container such as

File table object

Characteristics are attributes of entities modeled as data element concepts

Data element concepts are mapped to data elements

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

7

Characteristics

A data element concept has values taken from a conceptual domain

One data element concept might be mapped to more than one instantiation as a data element

A data element has values taken from a value domain

A conceptual domain might be mapped to more than one instantiation as a value domain

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

8

More on Entities amp Characteristics

An entity may have many characteristics

A data domain may refer to many data element concepts

An instantiated container may have many data elements

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

9

Enumeration of Data Elements

Each data element concept is manifested as one or more data elements in a specific system

The template is used to map data element concepts to used data elements

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

10

Data Element Identifier

NameData

Element Concept

UsageConceptual

DomainValue

DomainStorage

Data TypePresentation

Data TypeUnit of

MeasureBusiness

Rules

20988Customer State State Salesforcecom CST VST-1 Char(2) Char(2) NA

May not be null

Business Terms and Data Element Concepts

Map use of a business term to a definition then to the entity or characteristic

Customer is used in reference to the customer entity

Account Number is used in reference to an attribute of a customer entity

Need to track list of data element concepts

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

11

Concept ID Concept Business Term Definition ID

16-A334License or Permit Holder Licensee BT-977

16-A334License or Permit Holder Permit Holder BT-983

Data Harmonization

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

12

Identify inconsistent definitions conflicts

in data domains format variations

Extract data definitions from current guidance documents etc and categorize definitions

using standard terminology

Integrate data elements into a single reference

source then combine common

data elements

Identify authoritative source

for definitionsAssign names using Naming Convention

Document in metadata Registry

Identify AnomaliesResolve and Standardize

Integrate amp CollateExtract amp Collect

Regu

latio

ns

Po

licie

s

Documents

Forms

Extract amp Collect

Collected metadata includes

Assigned identifier

Data element name

Related business terms

Definition

Data type

Length

Business rules

IssuesComments

Reference domain

Standard name

Authoritative sources

Lineage

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

13

Data Element

Type

FirstName VARCHAR(35)

LastName VARCHAR(40)

SSN CHAR(11)

Telephone VARCHAR(20)

Data Element

Type

First VARCHAR(25)

Middle VARCHAR(25)

Last VARCHAR(30)

SocialSec CHAR(9)

Understanding Reference Data - Assessment

copy 2013 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

14

Jean Montard 0 062672

Michael Evans 0 112168

Fran Peterson 1 030276

Pat Lawson 1 041779

J Montard M 062672

M Evans M 112168

F Peterson F 030276

P Lawson F 041779

bull Each of these data sets have matching records for unique individuals

bull Each has a code value with a 111 correspondence

bull Understand why the values differ in each data set

Jean J Montard F 062672

Michael D Evans F 112168

Fran S Peterson M 030276

Pat O Lawson M 041779

15

Integrate amp Collate

CORP-

QWODR-25

Quarterly Wage

Employee Wage

Amount

11 AN This field will contain the

information as provided from

the Quarterly Wage record

submitted for State Filing

PROPOSED State

Corporate

Quarterly Wage

OUTPUT DETAIL

RECORD

EMPR-385 WAGE AMOUNT The amount of a

personrsquos wages

during a

Reporting

Quarter

11 Signed

Numeric

00000000000 through

99999999999 The last two

positions are implied to be to

the right of the decimal point

Conditional for the following

output record Federal

Employee Locate Response

Record

Federal Match

System

ID Data Element

Name

Definition L T Data Element Rules IssuesCom

ments

Mapping Authoritative

Source

CORP-

QWODR-25

Quarterly Wage

Employee Wage

Amount

11 AN This field will contain the

information as provided from

the Quarterly Wage record

submitted for State Filing

PROPOSED State

Corporate

Quarterly Wage

OUTPUT DETAIL

RECORD

EMPR-385 WAGE AMOUNT The amount of a

personrsquos wages

during a

Reporting

Quarter

11 Signed

Numeric

00000000000 through

99999999999 The last two

positions are implied to be to

the right of the decimal point

Conditional for the following

output record Federal

Employee Locate Response

Record

Federal Match

System

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

Identify Anomalies

Inconsistency or ambiguity for similarly-named data elements

Inconsistency of explicit data element business rules

Incomplete or inconsistent reference value domains

Inconsistent formats

Conflicting data types

Abbreviations vs full names

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

16

ID Data Element

Name

Definition L T Data Element Rules Mapping

SWA-UI-

OD-15

Claimant State Lacking definition 2 AN If present this field will contain the Claimant

State code as provided on the submitted UI

record

State UI Output Detail

Record

IRS-DME-

15

Sex Lacking definition 1 AN Sex Code from the Person Table will contain

spaces if not present on the Person Table

Values not specified

IRS DATA MATCH

EXTRACT RECORD

FCR-69 Benefit Amount The monetary amount of Unemployment

Insurance benefits a person received during

a Reporting Period This definition

does not specify the Reporting

Period SWA specify quarter

[see below]

11 AN 00000000000 through 99999999999 The last

two positions are implied to be to

the right of the decimal point This field

will contain all zeroes when there is no Benefit

Amount or the information is not available

Conditional for the following output record bull

Federal Match Record

SWA-OD-

19

Benefit Amount This field will contain the gross amount of UI

benefits prior to any deductions paid to a

claimant during the reporting quarter as

provided on the UI record submitted to the

NDNH

11 AN Values are 00000000000 through 99999999999

without decimal

This field is whole dollars only

Potential conflict with FCR-69

SWA UI Output Detail

Record

Resolve amp Standardize

Identify authoritative

sources

Prioritize potential

harmonized definitions

Review with subject matter experts

Consolidate if possible

Differentiate if necessary

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

17

Isomorphic Domains

We can say that value domains A and B are isomorphic if

The cardinality of A is equal to the cardinality of B (they have the same number of values)

Both A and B are associated with a conceptual domain C with the same cardinality as A and B

There is an enumeration of value meanings in A that is a 1-1 mapping to concepts in C

There is an enumeration of value meanings in B that is a 1-1 mapping to concepts in C

In other words if two value domains have a one-to-one mapping to the same conceptual domain values they are isomorphic

Isomorphic domains can be harmonized

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

18

Domain Congruence

Two value domains are congruent if their value sets intersect and the size of the intersection meets or exceeds a predefined threshold

Example

FIPS 2-Character State Codes contain values for all US States

USPS 2-Character State Codes contain values for all US States as well as AA AE and AP postal codes for military delivery

Under certain circumstances the two domains can be harmonized

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

19

Encouraging a Culture of Semantic Harmony

Small variance in definitions in isolated functions become magnified when data is shared across functions

Establish a level playing ground by

Instituting a common business term glossary

Harmonizing business term definitions

Unifying shared reference data into conceptual domains and corresponding value domains

Socializing use of shared metadata

Establishing standards for future development

Integrate methods for monitoring compliance with standards

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

20

Check Out These Resources

wwwknowledge-integritycom

wwwdataqualitybookcom

If you have questions comments or suggestions please contact me

David Loshin

301-754-6350

loshinknowledge-integritycom

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301) 754-6350

21

Page 4: Metadata Melodies Webinar with David Loshin Presentation

Evolution of the Business Metadata Glossary

Many sources of entity concepts and business terms may conflict with each other

The data steward must facilitate the collection and documentation of business terms

The data steward must also prepare for harmonization of terms

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

4

Policies

System Docs

Processes

Models

Standards

Applications

Business Rules

Profiling

Etc

Entity Concepts

BusinessTerms

DefinitionContextual

Meaning

hellip hellip

DefinitionContextual

Meaning

DefinitionContextual

Meaning

DefinitionContextual

Meaning

Example ndash Identifying Business Terms

Order ConfirmationIf you do not receive a confirmation number (in the form of a confirmation page or email) after submitting payment information or if you experience an error message or service interruption after submitting payment information it is your responsibility to confirm with FizzDizzleCustomer Service whether or not your order has been placed

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

5

bull Youbull Confirmation numberbull Confirmation pagebull Confirmation emailbull Payment informationbull Error messagebull Service interruptionbull FizzDizzle Customer Servicebull Order

Nouns

Example ndash Identifying Business Terms

Order ConfirmationIf you do not receive a confirmation number (in the form of a confirmation page or email) after submitting payment information or if you experience an error message or service interruption after submitting payment information it is your responsibility to confirm with FizzDizzleCustomer Service whether or not your order has been placed

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

6

bull Receivebull Submittingbull Experiencebull Confirmbull Placed

Verbs

Entities amp Characteristics

Entities are core concepts that are mapped to conceptual data domain models such as

Customer Organization Order Product

The conceptual Data Domain is mapped to a container such as

File table object

Characteristics are attributes of entities modeled as data element concepts

Data element concepts are mapped to data elements

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

7

Characteristics

A data element concept has values taken from a conceptual domain

One data element concept might be mapped to more than one instantiation as a data element

A data element has values taken from a value domain

A conceptual domain might be mapped to more than one instantiation as a value domain

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

8

More on Entities amp Characteristics

An entity may have many characteristics

A data domain may refer to many data element concepts

An instantiated container may have many data elements

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

9

Enumeration of Data Elements

Each data element concept is manifested as one or more data elements in a specific system

The template is used to map data element concepts to used data elements

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

10

Data Element Identifier

NameData

Element Concept

UsageConceptual

DomainValue

DomainStorage

Data TypePresentation

Data TypeUnit of

MeasureBusiness

Rules

20988Customer State State Salesforcecom CST VST-1 Char(2) Char(2) NA

May not be null

Business Terms and Data Element Concepts

Map use of a business term to a definition then to the entity or characteristic

Customer is used in reference to the customer entity

Account Number is used in reference to an attribute of a customer entity

Need to track list of data element concepts

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

11

Concept ID Concept Business Term Definition ID

16-A334License or Permit Holder Licensee BT-977

16-A334License or Permit Holder Permit Holder BT-983

Data Harmonization

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

12

Identify inconsistent definitions conflicts

in data domains format variations

Extract data definitions from current guidance documents etc and categorize definitions

using standard terminology

Integrate data elements into a single reference

source then combine common

data elements

Identify authoritative source

for definitionsAssign names using Naming Convention

Document in metadata Registry

Identify AnomaliesResolve and Standardize

Integrate amp CollateExtract amp Collect

Regu

latio

ns

Po

licie

s

Documents

Forms

Extract amp Collect

Collected metadata includes

Assigned identifier

Data element name

Related business terms

Definition

Data type

Length

Business rules

IssuesComments

Reference domain

Standard name

Authoritative sources

Lineage

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

13

Data Element

Type

FirstName VARCHAR(35)

LastName VARCHAR(40)

SSN CHAR(11)

Telephone VARCHAR(20)

Data Element

Type

First VARCHAR(25)

Middle VARCHAR(25)

Last VARCHAR(30)

SocialSec CHAR(9)

Understanding Reference Data - Assessment

copy 2013 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

14

Jean Montard 0 062672

Michael Evans 0 112168

Fran Peterson 1 030276

Pat Lawson 1 041779

J Montard M 062672

M Evans M 112168

F Peterson F 030276

P Lawson F 041779

bull Each of these data sets have matching records for unique individuals

bull Each has a code value with a 111 correspondence

bull Understand why the values differ in each data set

Jean J Montard F 062672

Michael D Evans F 112168

Fran S Peterson M 030276

Pat O Lawson M 041779

15

Integrate amp Collate

CORP-

QWODR-25

Quarterly Wage

Employee Wage

Amount

11 AN This field will contain the

information as provided from

the Quarterly Wage record

submitted for State Filing

PROPOSED State

Corporate

Quarterly Wage

OUTPUT DETAIL

RECORD

EMPR-385 WAGE AMOUNT The amount of a

personrsquos wages

during a

Reporting

Quarter

11 Signed

Numeric

00000000000 through

99999999999 The last two

positions are implied to be to

the right of the decimal point

Conditional for the following

output record Federal

Employee Locate Response

Record

Federal Match

System

ID Data Element

Name

Definition L T Data Element Rules IssuesCom

ments

Mapping Authoritative

Source

CORP-

QWODR-25

Quarterly Wage

Employee Wage

Amount

11 AN This field will contain the

information as provided from

the Quarterly Wage record

submitted for State Filing

PROPOSED State

Corporate

Quarterly Wage

OUTPUT DETAIL

RECORD

EMPR-385 WAGE AMOUNT The amount of a

personrsquos wages

during a

Reporting

Quarter

11 Signed

Numeric

00000000000 through

99999999999 The last two

positions are implied to be to

the right of the decimal point

Conditional for the following

output record Federal

Employee Locate Response

Record

Federal Match

System

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

Identify Anomalies

Inconsistency or ambiguity for similarly-named data elements

Inconsistency of explicit data element business rules

Incomplete or inconsistent reference value domains

Inconsistent formats

Conflicting data types

Abbreviations vs full names

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

16

ID Data Element

Name

Definition L T Data Element Rules Mapping

SWA-UI-

OD-15

Claimant State Lacking definition 2 AN If present this field will contain the Claimant

State code as provided on the submitted UI

record

State UI Output Detail

Record

IRS-DME-

15

Sex Lacking definition 1 AN Sex Code from the Person Table will contain

spaces if not present on the Person Table

Values not specified

IRS DATA MATCH

EXTRACT RECORD

FCR-69 Benefit Amount The monetary amount of Unemployment

Insurance benefits a person received during

a Reporting Period This definition

does not specify the Reporting

Period SWA specify quarter

[see below]

11 AN 00000000000 through 99999999999 The last

two positions are implied to be to

the right of the decimal point This field

will contain all zeroes when there is no Benefit

Amount or the information is not available

Conditional for the following output record bull

Federal Match Record

SWA-OD-

19

Benefit Amount This field will contain the gross amount of UI

benefits prior to any deductions paid to a

claimant during the reporting quarter as

provided on the UI record submitted to the

NDNH

11 AN Values are 00000000000 through 99999999999

without decimal

This field is whole dollars only

Potential conflict with FCR-69

SWA UI Output Detail

Record

Resolve amp Standardize

Identify authoritative

sources

Prioritize potential

harmonized definitions

Review with subject matter experts

Consolidate if possible

Differentiate if necessary

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

17

Isomorphic Domains

We can say that value domains A and B are isomorphic if

The cardinality of A is equal to the cardinality of B (they have the same number of values)

Both A and B are associated with a conceptual domain C with the same cardinality as A and B

There is an enumeration of value meanings in A that is a 1-1 mapping to concepts in C

There is an enumeration of value meanings in B that is a 1-1 mapping to concepts in C

In other words if two value domains have a one-to-one mapping to the same conceptual domain values they are isomorphic

Isomorphic domains can be harmonized

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

18

Domain Congruence

Two value domains are congruent if their value sets intersect and the size of the intersection meets or exceeds a predefined threshold

Example

FIPS 2-Character State Codes contain values for all US States

USPS 2-Character State Codes contain values for all US States as well as AA AE and AP postal codes for military delivery

Under certain circumstances the two domains can be harmonized

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

19

Encouraging a Culture of Semantic Harmony

Small variance in definitions in isolated functions become magnified when data is shared across functions

Establish a level playing ground by

Instituting a common business term glossary

Harmonizing business term definitions

Unifying shared reference data into conceptual domains and corresponding value domains

Socializing use of shared metadata

Establishing standards for future development

Integrate methods for monitoring compliance with standards

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

20

Check Out These Resources

wwwknowledge-integritycom

wwwdataqualitybookcom

If you have questions comments or suggestions please contact me

David Loshin

301-754-6350

loshinknowledge-integritycom

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301) 754-6350

21

Page 5: Metadata Melodies Webinar with David Loshin Presentation

Example ndash Identifying Business Terms

Order ConfirmationIf you do not receive a confirmation number (in the form of a confirmation page or email) after submitting payment information or if you experience an error message or service interruption after submitting payment information it is your responsibility to confirm with FizzDizzleCustomer Service whether or not your order has been placed

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

5

bull Youbull Confirmation numberbull Confirmation pagebull Confirmation emailbull Payment informationbull Error messagebull Service interruptionbull FizzDizzle Customer Servicebull Order

Nouns

Example ndash Identifying Business Terms

Order ConfirmationIf you do not receive a confirmation number (in the form of a confirmation page or email) after submitting payment information or if you experience an error message or service interruption after submitting payment information it is your responsibility to confirm with FizzDizzleCustomer Service whether or not your order has been placed

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

6

bull Receivebull Submittingbull Experiencebull Confirmbull Placed

Verbs

Entities amp Characteristics

Entities are core concepts that are mapped to conceptual data domain models such as

Customer Organization Order Product

The conceptual Data Domain is mapped to a container such as

File table object

Characteristics are attributes of entities modeled as data element concepts

Data element concepts are mapped to data elements

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

7

Characteristics

A data element concept has values taken from a conceptual domain

One data element concept might be mapped to more than one instantiation as a data element

A data element has values taken from a value domain

A conceptual domain might be mapped to more than one instantiation as a value domain

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

8

More on Entities amp Characteristics

An entity may have many characteristics

A data domain may refer to many data element concepts

An instantiated container may have many data elements

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

9

Enumeration of Data Elements

Each data element concept is manifested as one or more data elements in a specific system

The template is used to map data element concepts to used data elements

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

10

Data Element Identifier

NameData

Element Concept

UsageConceptual

DomainValue

DomainStorage

Data TypePresentation

Data TypeUnit of

MeasureBusiness

Rules

20988Customer State State Salesforcecom CST VST-1 Char(2) Char(2) NA

May not be null

Business Terms and Data Element Concepts

Map use of a business term to a definition then to the entity or characteristic

Customer is used in reference to the customer entity

Account Number is used in reference to an attribute of a customer entity

Need to track list of data element concepts

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

11

Concept ID Concept Business Term Definition ID

16-A334License or Permit Holder Licensee BT-977

16-A334License or Permit Holder Permit Holder BT-983

Data Harmonization

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

12

Identify inconsistent definitions conflicts

in data domains format variations

Extract data definitions from current guidance documents etc and categorize definitions

using standard terminology

Integrate data elements into a single reference

source then combine common

data elements

Identify authoritative source

for definitionsAssign names using Naming Convention

Document in metadata Registry

Identify AnomaliesResolve and Standardize

Integrate amp CollateExtract amp Collect

Regu

latio

ns

Po

licie

s

Documents

Forms

Extract amp Collect

Collected metadata includes

Assigned identifier

Data element name

Related business terms

Definition

Data type

Length

Business rules

IssuesComments

Reference domain

Standard name

Authoritative sources

Lineage

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

13

Data Element

Type

FirstName VARCHAR(35)

LastName VARCHAR(40)

SSN CHAR(11)

Telephone VARCHAR(20)

Data Element

Type

First VARCHAR(25)

Middle VARCHAR(25)

Last VARCHAR(30)

SocialSec CHAR(9)

Understanding Reference Data - Assessment

copy 2013 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

14

Jean Montard 0 062672

Michael Evans 0 112168

Fran Peterson 1 030276

Pat Lawson 1 041779

J Montard M 062672

M Evans M 112168

F Peterson F 030276

P Lawson F 041779

bull Each of these data sets have matching records for unique individuals

bull Each has a code value with a 111 correspondence

bull Understand why the values differ in each data set

Jean J Montard F 062672

Michael D Evans F 112168

Fran S Peterson M 030276

Pat O Lawson M 041779

15

Integrate amp Collate

CORP-

QWODR-25

Quarterly Wage

Employee Wage

Amount

11 AN This field will contain the

information as provided from

the Quarterly Wage record

submitted for State Filing

PROPOSED State

Corporate

Quarterly Wage

OUTPUT DETAIL

RECORD

EMPR-385 WAGE AMOUNT The amount of a

personrsquos wages

during a

Reporting

Quarter

11 Signed

Numeric

00000000000 through

99999999999 The last two

positions are implied to be to

the right of the decimal point

Conditional for the following

output record Federal

Employee Locate Response

Record

Federal Match

System

ID Data Element

Name

Definition L T Data Element Rules IssuesCom

ments

Mapping Authoritative

Source

CORP-

QWODR-25

Quarterly Wage

Employee Wage

Amount

11 AN This field will contain the

information as provided from

the Quarterly Wage record

submitted for State Filing

PROPOSED State

Corporate

Quarterly Wage

OUTPUT DETAIL

RECORD

EMPR-385 WAGE AMOUNT The amount of a

personrsquos wages

during a

Reporting

Quarter

11 Signed

Numeric

00000000000 through

99999999999 The last two

positions are implied to be to

the right of the decimal point

Conditional for the following

output record Federal

Employee Locate Response

Record

Federal Match

System

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

Identify Anomalies

Inconsistency or ambiguity for similarly-named data elements

Inconsistency of explicit data element business rules

Incomplete or inconsistent reference value domains

Inconsistent formats

Conflicting data types

Abbreviations vs full names

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

16

ID Data Element

Name

Definition L T Data Element Rules Mapping

SWA-UI-

OD-15

Claimant State Lacking definition 2 AN If present this field will contain the Claimant

State code as provided on the submitted UI

record

State UI Output Detail

Record

IRS-DME-

15

Sex Lacking definition 1 AN Sex Code from the Person Table will contain

spaces if not present on the Person Table

Values not specified

IRS DATA MATCH

EXTRACT RECORD

FCR-69 Benefit Amount The monetary amount of Unemployment

Insurance benefits a person received during

a Reporting Period This definition

does not specify the Reporting

Period SWA specify quarter

[see below]

11 AN 00000000000 through 99999999999 The last

two positions are implied to be to

the right of the decimal point This field

will contain all zeroes when there is no Benefit

Amount or the information is not available

Conditional for the following output record bull

Federal Match Record

SWA-OD-

19

Benefit Amount This field will contain the gross amount of UI

benefits prior to any deductions paid to a

claimant during the reporting quarter as

provided on the UI record submitted to the

NDNH

11 AN Values are 00000000000 through 99999999999

without decimal

This field is whole dollars only

Potential conflict with FCR-69

SWA UI Output Detail

Record

Resolve amp Standardize

Identify authoritative

sources

Prioritize potential

harmonized definitions

Review with subject matter experts

Consolidate if possible

Differentiate if necessary

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

17

Isomorphic Domains

We can say that value domains A and B are isomorphic if

The cardinality of A is equal to the cardinality of B (they have the same number of values)

Both A and B are associated with a conceptual domain C with the same cardinality as A and B

There is an enumeration of value meanings in A that is a 1-1 mapping to concepts in C

There is an enumeration of value meanings in B that is a 1-1 mapping to concepts in C

In other words if two value domains have a one-to-one mapping to the same conceptual domain values they are isomorphic

Isomorphic domains can be harmonized

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

18

Domain Congruence

Two value domains are congruent if their value sets intersect and the size of the intersection meets or exceeds a predefined threshold

Example

FIPS 2-Character State Codes contain values for all US States

USPS 2-Character State Codes contain values for all US States as well as AA AE and AP postal codes for military delivery

Under certain circumstances the two domains can be harmonized

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

19

Encouraging a Culture of Semantic Harmony

Small variance in definitions in isolated functions become magnified when data is shared across functions

Establish a level playing ground by

Instituting a common business term glossary

Harmonizing business term definitions

Unifying shared reference data into conceptual domains and corresponding value domains

Socializing use of shared metadata

Establishing standards for future development

Integrate methods for monitoring compliance with standards

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

20

Check Out These Resources

wwwknowledge-integritycom

wwwdataqualitybookcom

If you have questions comments or suggestions please contact me

David Loshin

301-754-6350

loshinknowledge-integritycom

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301) 754-6350

21

Page 6: Metadata Melodies Webinar with David Loshin Presentation

Example ndash Identifying Business Terms

Order ConfirmationIf you do not receive a confirmation number (in the form of a confirmation page or email) after submitting payment information or if you experience an error message or service interruption after submitting payment information it is your responsibility to confirm with FizzDizzleCustomer Service whether or not your order has been placed

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

6

bull Receivebull Submittingbull Experiencebull Confirmbull Placed

Verbs

Entities amp Characteristics

Entities are core concepts that are mapped to conceptual data domain models such as

Customer Organization Order Product

The conceptual Data Domain is mapped to a container such as

File table object

Characteristics are attributes of entities modeled as data element concepts

Data element concepts are mapped to data elements

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

7

Characteristics

A data element concept has values taken from a conceptual domain

One data element concept might be mapped to more than one instantiation as a data element

A data element has values taken from a value domain

A conceptual domain might be mapped to more than one instantiation as a value domain

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

8

More on Entities amp Characteristics

An entity may have many characteristics

A data domain may refer to many data element concepts

An instantiated container may have many data elements

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

9

Enumeration of Data Elements

Each data element concept is manifested as one or more data elements in a specific system

The template is used to map data element concepts to used data elements

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

10

Data Element Identifier

NameData

Element Concept

UsageConceptual

DomainValue

DomainStorage

Data TypePresentation

Data TypeUnit of

MeasureBusiness

Rules

20988Customer State State Salesforcecom CST VST-1 Char(2) Char(2) NA

May not be null

Business Terms and Data Element Concepts

Map use of a business term to a definition then to the entity or characteristic

Customer is used in reference to the customer entity

Account Number is used in reference to an attribute of a customer entity

Need to track list of data element concepts

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

11

Concept ID Concept Business Term Definition ID

16-A334License or Permit Holder Licensee BT-977

16-A334License or Permit Holder Permit Holder BT-983

Data Harmonization

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

12

Identify inconsistent definitions conflicts

in data domains format variations

Extract data definitions from current guidance documents etc and categorize definitions

using standard terminology

Integrate data elements into a single reference

source then combine common

data elements

Identify authoritative source

for definitionsAssign names using Naming Convention

Document in metadata Registry

Identify AnomaliesResolve and Standardize

Integrate amp CollateExtract amp Collect

Regu

latio

ns

Po

licie

s

Documents

Forms

Extract amp Collect

Collected metadata includes

Assigned identifier

Data element name

Related business terms

Definition

Data type

Length

Business rules

IssuesComments

Reference domain

Standard name

Authoritative sources

Lineage

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

13

Data Element

Type

FirstName VARCHAR(35)

LastName VARCHAR(40)

SSN CHAR(11)

Telephone VARCHAR(20)

Data Element

Type

First VARCHAR(25)

Middle VARCHAR(25)

Last VARCHAR(30)

SocialSec CHAR(9)

Understanding Reference Data - Assessment

copy 2013 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

14

Jean Montard 0 062672

Michael Evans 0 112168

Fran Peterson 1 030276

Pat Lawson 1 041779

J Montard M 062672

M Evans M 112168

F Peterson F 030276

P Lawson F 041779

bull Each of these data sets have matching records for unique individuals

bull Each has a code value with a 111 correspondence

bull Understand why the values differ in each data set

Jean J Montard F 062672

Michael D Evans F 112168

Fran S Peterson M 030276

Pat O Lawson M 041779

15

Integrate amp Collate

CORP-

QWODR-25

Quarterly Wage

Employee Wage

Amount

11 AN This field will contain the

information as provided from

the Quarterly Wage record

submitted for State Filing

PROPOSED State

Corporate

Quarterly Wage

OUTPUT DETAIL

RECORD

EMPR-385 WAGE AMOUNT The amount of a

personrsquos wages

during a

Reporting

Quarter

11 Signed

Numeric

00000000000 through

99999999999 The last two

positions are implied to be to

the right of the decimal point

Conditional for the following

output record Federal

Employee Locate Response

Record

Federal Match

System

ID Data Element

Name

Definition L T Data Element Rules IssuesCom

ments

Mapping Authoritative

Source

CORP-

QWODR-25

Quarterly Wage

Employee Wage

Amount

11 AN This field will contain the

information as provided from

the Quarterly Wage record

submitted for State Filing

PROPOSED State

Corporate

Quarterly Wage

OUTPUT DETAIL

RECORD

EMPR-385 WAGE AMOUNT The amount of a

personrsquos wages

during a

Reporting

Quarter

11 Signed

Numeric

00000000000 through

99999999999 The last two

positions are implied to be to

the right of the decimal point

Conditional for the following

output record Federal

Employee Locate Response

Record

Federal Match

System

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

Identify Anomalies

Inconsistency or ambiguity for similarly-named data elements

Inconsistency of explicit data element business rules

Incomplete or inconsistent reference value domains

Inconsistent formats

Conflicting data types

Abbreviations vs full names

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

16

ID Data Element

Name

Definition L T Data Element Rules Mapping

SWA-UI-

OD-15

Claimant State Lacking definition 2 AN If present this field will contain the Claimant

State code as provided on the submitted UI

record

State UI Output Detail

Record

IRS-DME-

15

Sex Lacking definition 1 AN Sex Code from the Person Table will contain

spaces if not present on the Person Table

Values not specified

IRS DATA MATCH

EXTRACT RECORD

FCR-69 Benefit Amount The monetary amount of Unemployment

Insurance benefits a person received during

a Reporting Period This definition

does not specify the Reporting

Period SWA specify quarter

[see below]

11 AN 00000000000 through 99999999999 The last

two positions are implied to be to

the right of the decimal point This field

will contain all zeroes when there is no Benefit

Amount or the information is not available

Conditional for the following output record bull

Federal Match Record

SWA-OD-

19

Benefit Amount This field will contain the gross amount of UI

benefits prior to any deductions paid to a

claimant during the reporting quarter as

provided on the UI record submitted to the

NDNH

11 AN Values are 00000000000 through 99999999999

without decimal

This field is whole dollars only

Potential conflict with FCR-69

SWA UI Output Detail

Record

Resolve amp Standardize

Identify authoritative

sources

Prioritize potential

harmonized definitions

Review with subject matter experts

Consolidate if possible

Differentiate if necessary

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

17

Isomorphic Domains

We can say that value domains A and B are isomorphic if

The cardinality of A is equal to the cardinality of B (they have the same number of values)

Both A and B are associated with a conceptual domain C with the same cardinality as A and B

There is an enumeration of value meanings in A that is a 1-1 mapping to concepts in C

There is an enumeration of value meanings in B that is a 1-1 mapping to concepts in C

In other words if two value domains have a one-to-one mapping to the same conceptual domain values they are isomorphic

Isomorphic domains can be harmonized

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

18

Domain Congruence

Two value domains are congruent if their value sets intersect and the size of the intersection meets or exceeds a predefined threshold

Example

FIPS 2-Character State Codes contain values for all US States

USPS 2-Character State Codes contain values for all US States as well as AA AE and AP postal codes for military delivery

Under certain circumstances the two domains can be harmonized

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

19

Encouraging a Culture of Semantic Harmony

Small variance in definitions in isolated functions become magnified when data is shared across functions

Establish a level playing ground by

Instituting a common business term glossary

Harmonizing business term definitions

Unifying shared reference data into conceptual domains and corresponding value domains

Socializing use of shared metadata

Establishing standards for future development

Integrate methods for monitoring compliance with standards

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

20

Check Out These Resources

wwwknowledge-integritycom

wwwdataqualitybookcom

If you have questions comments or suggestions please contact me

David Loshin

301-754-6350

loshinknowledge-integritycom

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301) 754-6350

21

Page 7: Metadata Melodies Webinar with David Loshin Presentation

Entities amp Characteristics

Entities are core concepts that are mapped to conceptual data domain models such as

Customer Organization Order Product

The conceptual Data Domain is mapped to a container such as

File table object

Characteristics are attributes of entities modeled as data element concepts

Data element concepts are mapped to data elements

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

7

Characteristics

A data element concept has values taken from a conceptual domain

One data element concept might be mapped to more than one instantiation as a data element

A data element has values taken from a value domain

A conceptual domain might be mapped to more than one instantiation as a value domain

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

8

More on Entities amp Characteristics

An entity may have many characteristics

A data domain may refer to many data element concepts

An instantiated container may have many data elements

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

9

Enumeration of Data Elements

Each data element concept is manifested as one or more data elements in a specific system

The template is used to map data element concepts to used data elements

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

10

Data Element Identifier

NameData

Element Concept

UsageConceptual

DomainValue

DomainStorage

Data TypePresentation

Data TypeUnit of

MeasureBusiness

Rules

20988Customer State State Salesforcecom CST VST-1 Char(2) Char(2) NA

May not be null

Business Terms and Data Element Concepts

Map use of a business term to a definition then to the entity or characteristic

Customer is used in reference to the customer entity

Account Number is used in reference to an attribute of a customer entity

Need to track list of data element concepts

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

11

Concept ID Concept Business Term Definition ID

16-A334License or Permit Holder Licensee BT-977

16-A334License or Permit Holder Permit Holder BT-983

Data Harmonization

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

12

Identify inconsistent definitions conflicts

in data domains format variations

Extract data definitions from current guidance documents etc and categorize definitions

using standard terminology

Integrate data elements into a single reference

source then combine common

data elements

Identify authoritative source

for definitionsAssign names using Naming Convention

Document in metadata Registry

Identify AnomaliesResolve and Standardize

Integrate amp CollateExtract amp Collect

Regu

latio

ns

Po

licie

s

Documents

Forms

Extract amp Collect

Collected metadata includes

Assigned identifier

Data element name

Related business terms

Definition

Data type

Length

Business rules

IssuesComments

Reference domain

Standard name

Authoritative sources

Lineage

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

13

Data Element

Type

FirstName VARCHAR(35)

LastName VARCHAR(40)

SSN CHAR(11)

Telephone VARCHAR(20)

Data Element

Type

First VARCHAR(25)

Middle VARCHAR(25)

Last VARCHAR(30)

SocialSec CHAR(9)

Understanding Reference Data - Assessment

copy 2013 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

14

Jean Montard 0 062672

Michael Evans 0 112168

Fran Peterson 1 030276

Pat Lawson 1 041779

J Montard M 062672

M Evans M 112168

F Peterson F 030276

P Lawson F 041779

bull Each of these data sets have matching records for unique individuals

bull Each has a code value with a 111 correspondence

bull Understand why the values differ in each data set

Jean J Montard F 062672

Michael D Evans F 112168

Fran S Peterson M 030276

Pat O Lawson M 041779

15

Integrate amp Collate

CORP-

QWODR-25

Quarterly Wage

Employee Wage

Amount

11 AN This field will contain the

information as provided from

the Quarterly Wage record

submitted for State Filing

PROPOSED State

Corporate

Quarterly Wage

OUTPUT DETAIL

RECORD

EMPR-385 WAGE AMOUNT The amount of a

personrsquos wages

during a

Reporting

Quarter

11 Signed

Numeric

00000000000 through

99999999999 The last two

positions are implied to be to

the right of the decimal point

Conditional for the following

output record Federal

Employee Locate Response

Record

Federal Match

System

ID Data Element

Name

Definition L T Data Element Rules IssuesCom

ments

Mapping Authoritative

Source

CORP-

QWODR-25

Quarterly Wage

Employee Wage

Amount

11 AN This field will contain the

information as provided from

the Quarterly Wage record

submitted for State Filing

PROPOSED State

Corporate

Quarterly Wage

OUTPUT DETAIL

RECORD

EMPR-385 WAGE AMOUNT The amount of a

personrsquos wages

during a

Reporting

Quarter

11 Signed

Numeric

00000000000 through

99999999999 The last two

positions are implied to be to

the right of the decimal point

Conditional for the following

output record Federal

Employee Locate Response

Record

Federal Match

System

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

Identify Anomalies

Inconsistency or ambiguity for similarly-named data elements

Inconsistency of explicit data element business rules

Incomplete or inconsistent reference value domains

Inconsistent formats

Conflicting data types

Abbreviations vs full names

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

16

ID Data Element

Name

Definition L T Data Element Rules Mapping

SWA-UI-

OD-15

Claimant State Lacking definition 2 AN If present this field will contain the Claimant

State code as provided on the submitted UI

record

State UI Output Detail

Record

IRS-DME-

15

Sex Lacking definition 1 AN Sex Code from the Person Table will contain

spaces if not present on the Person Table

Values not specified

IRS DATA MATCH

EXTRACT RECORD

FCR-69 Benefit Amount The monetary amount of Unemployment

Insurance benefits a person received during

a Reporting Period This definition

does not specify the Reporting

Period SWA specify quarter

[see below]

11 AN 00000000000 through 99999999999 The last

two positions are implied to be to

the right of the decimal point This field

will contain all zeroes when there is no Benefit

Amount or the information is not available

Conditional for the following output record bull

Federal Match Record

SWA-OD-

19

Benefit Amount This field will contain the gross amount of UI

benefits prior to any deductions paid to a

claimant during the reporting quarter as

provided on the UI record submitted to the

NDNH

11 AN Values are 00000000000 through 99999999999

without decimal

This field is whole dollars only

Potential conflict with FCR-69

SWA UI Output Detail

Record

Resolve amp Standardize

Identify authoritative

sources

Prioritize potential

harmonized definitions

Review with subject matter experts

Consolidate if possible

Differentiate if necessary

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

17

Isomorphic Domains

We can say that value domains A and B are isomorphic if

The cardinality of A is equal to the cardinality of B (they have the same number of values)

Both A and B are associated with a conceptual domain C with the same cardinality as A and B

There is an enumeration of value meanings in A that is a 1-1 mapping to concepts in C

There is an enumeration of value meanings in B that is a 1-1 mapping to concepts in C

In other words if two value domains have a one-to-one mapping to the same conceptual domain values they are isomorphic

Isomorphic domains can be harmonized

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

18

Domain Congruence

Two value domains are congruent if their value sets intersect and the size of the intersection meets or exceeds a predefined threshold

Example

FIPS 2-Character State Codes contain values for all US States

USPS 2-Character State Codes contain values for all US States as well as AA AE and AP postal codes for military delivery

Under certain circumstances the two domains can be harmonized

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

19

Encouraging a Culture of Semantic Harmony

Small variance in definitions in isolated functions become magnified when data is shared across functions

Establish a level playing ground by

Instituting a common business term glossary

Harmonizing business term definitions

Unifying shared reference data into conceptual domains and corresponding value domains

Socializing use of shared metadata

Establishing standards for future development

Integrate methods for monitoring compliance with standards

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

20

Check Out These Resources

wwwknowledge-integritycom

wwwdataqualitybookcom

If you have questions comments or suggestions please contact me

David Loshin

301-754-6350

loshinknowledge-integritycom

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301) 754-6350

21

Page 8: Metadata Melodies Webinar with David Loshin Presentation

Characteristics

A data element concept has values taken from a conceptual domain

One data element concept might be mapped to more than one instantiation as a data element

A data element has values taken from a value domain

A conceptual domain might be mapped to more than one instantiation as a value domain

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

8

More on Entities amp Characteristics

An entity may have many characteristics

A data domain may refer to many data element concepts

An instantiated container may have many data elements

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

9

Enumeration of Data Elements

Each data element concept is manifested as one or more data elements in a specific system

The template is used to map data element concepts to used data elements

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

10

Data Element Identifier

NameData

Element Concept

UsageConceptual

DomainValue

DomainStorage

Data TypePresentation

Data TypeUnit of

MeasureBusiness

Rules

20988Customer State State Salesforcecom CST VST-1 Char(2) Char(2) NA

May not be null

Business Terms and Data Element Concepts

Map use of a business term to a definition then to the entity or characteristic

Customer is used in reference to the customer entity

Account Number is used in reference to an attribute of a customer entity

Need to track list of data element concepts

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

11

Concept ID Concept Business Term Definition ID

16-A334License or Permit Holder Licensee BT-977

16-A334License or Permit Holder Permit Holder BT-983

Data Harmonization

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

12

Identify inconsistent definitions conflicts

in data domains format variations

Extract data definitions from current guidance documents etc and categorize definitions

using standard terminology

Integrate data elements into a single reference

source then combine common

data elements

Identify authoritative source

for definitionsAssign names using Naming Convention

Document in metadata Registry

Identify AnomaliesResolve and Standardize

Integrate amp CollateExtract amp Collect

Regu

latio

ns

Po

licie

s

Documents

Forms

Extract amp Collect

Collected metadata includes

Assigned identifier

Data element name

Related business terms

Definition

Data type

Length

Business rules

IssuesComments

Reference domain

Standard name

Authoritative sources

Lineage

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

13

Data Element

Type

FirstName VARCHAR(35)

LastName VARCHAR(40)

SSN CHAR(11)

Telephone VARCHAR(20)

Data Element

Type

First VARCHAR(25)

Middle VARCHAR(25)

Last VARCHAR(30)

SocialSec CHAR(9)

Understanding Reference Data - Assessment

copy 2013 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

14

Jean Montard 0 062672

Michael Evans 0 112168

Fran Peterson 1 030276

Pat Lawson 1 041779

J Montard M 062672

M Evans M 112168

F Peterson F 030276

P Lawson F 041779

bull Each of these data sets have matching records for unique individuals

bull Each has a code value with a 111 correspondence

bull Understand why the values differ in each data set

Jean J Montard F 062672

Michael D Evans F 112168

Fran S Peterson M 030276

Pat O Lawson M 041779

15

Integrate amp Collate

CORP-

QWODR-25

Quarterly Wage

Employee Wage

Amount

11 AN This field will contain the

information as provided from

the Quarterly Wage record

submitted for State Filing

PROPOSED State

Corporate

Quarterly Wage

OUTPUT DETAIL

RECORD

EMPR-385 WAGE AMOUNT The amount of a

personrsquos wages

during a

Reporting

Quarter

11 Signed

Numeric

00000000000 through

99999999999 The last two

positions are implied to be to

the right of the decimal point

Conditional for the following

output record Federal

Employee Locate Response

Record

Federal Match

System

ID Data Element

Name

Definition L T Data Element Rules IssuesCom

ments

Mapping Authoritative

Source

CORP-

QWODR-25

Quarterly Wage

Employee Wage

Amount

11 AN This field will contain the

information as provided from

the Quarterly Wage record

submitted for State Filing

PROPOSED State

Corporate

Quarterly Wage

OUTPUT DETAIL

RECORD

EMPR-385 WAGE AMOUNT The amount of a

personrsquos wages

during a

Reporting

Quarter

11 Signed

Numeric

00000000000 through

99999999999 The last two

positions are implied to be to

the right of the decimal point

Conditional for the following

output record Federal

Employee Locate Response

Record

Federal Match

System

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

Identify Anomalies

Inconsistency or ambiguity for similarly-named data elements

Inconsistency of explicit data element business rules

Incomplete or inconsistent reference value domains

Inconsistent formats

Conflicting data types

Abbreviations vs full names

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

16

ID Data Element

Name

Definition L T Data Element Rules Mapping

SWA-UI-

OD-15

Claimant State Lacking definition 2 AN If present this field will contain the Claimant

State code as provided on the submitted UI

record

State UI Output Detail

Record

IRS-DME-

15

Sex Lacking definition 1 AN Sex Code from the Person Table will contain

spaces if not present on the Person Table

Values not specified

IRS DATA MATCH

EXTRACT RECORD

FCR-69 Benefit Amount The monetary amount of Unemployment

Insurance benefits a person received during

a Reporting Period This definition

does not specify the Reporting

Period SWA specify quarter

[see below]

11 AN 00000000000 through 99999999999 The last

two positions are implied to be to

the right of the decimal point This field

will contain all zeroes when there is no Benefit

Amount or the information is not available

Conditional for the following output record bull

Federal Match Record

SWA-OD-

19

Benefit Amount This field will contain the gross amount of UI

benefits prior to any deductions paid to a

claimant during the reporting quarter as

provided on the UI record submitted to the

NDNH

11 AN Values are 00000000000 through 99999999999

without decimal

This field is whole dollars only

Potential conflict with FCR-69

SWA UI Output Detail

Record

Resolve amp Standardize

Identify authoritative

sources

Prioritize potential

harmonized definitions

Review with subject matter experts

Consolidate if possible

Differentiate if necessary

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

17

Isomorphic Domains

We can say that value domains A and B are isomorphic if

The cardinality of A is equal to the cardinality of B (they have the same number of values)

Both A and B are associated with a conceptual domain C with the same cardinality as A and B

There is an enumeration of value meanings in A that is a 1-1 mapping to concepts in C

There is an enumeration of value meanings in B that is a 1-1 mapping to concepts in C

In other words if two value domains have a one-to-one mapping to the same conceptual domain values they are isomorphic

Isomorphic domains can be harmonized

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

18

Domain Congruence

Two value domains are congruent if their value sets intersect and the size of the intersection meets or exceeds a predefined threshold

Example

FIPS 2-Character State Codes contain values for all US States

USPS 2-Character State Codes contain values for all US States as well as AA AE and AP postal codes for military delivery

Under certain circumstances the two domains can be harmonized

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

19

Encouraging a Culture of Semantic Harmony

Small variance in definitions in isolated functions become magnified when data is shared across functions

Establish a level playing ground by

Instituting a common business term glossary

Harmonizing business term definitions

Unifying shared reference data into conceptual domains and corresponding value domains

Socializing use of shared metadata

Establishing standards for future development

Integrate methods for monitoring compliance with standards

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

20

Check Out These Resources

wwwknowledge-integritycom

wwwdataqualitybookcom

If you have questions comments or suggestions please contact me

David Loshin

301-754-6350

loshinknowledge-integritycom

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301) 754-6350

21

Page 9: Metadata Melodies Webinar with David Loshin Presentation

More on Entities amp Characteristics

An entity may have many characteristics

A data domain may refer to many data element concepts

An instantiated container may have many data elements

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

9

Enumeration of Data Elements

Each data element concept is manifested as one or more data elements in a specific system

The template is used to map data element concepts to used data elements

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

10

Data Element Identifier

NameData

Element Concept

UsageConceptual

DomainValue

DomainStorage

Data TypePresentation

Data TypeUnit of

MeasureBusiness

Rules

20988Customer State State Salesforcecom CST VST-1 Char(2) Char(2) NA

May not be null

Business Terms and Data Element Concepts

Map use of a business term to a definition then to the entity or characteristic

Customer is used in reference to the customer entity

Account Number is used in reference to an attribute of a customer entity

Need to track list of data element concepts

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

11

Concept ID Concept Business Term Definition ID

16-A334License or Permit Holder Licensee BT-977

16-A334License or Permit Holder Permit Holder BT-983

Data Harmonization

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

12

Identify inconsistent definitions conflicts

in data domains format variations

Extract data definitions from current guidance documents etc and categorize definitions

using standard terminology

Integrate data elements into a single reference

source then combine common

data elements

Identify authoritative source

for definitionsAssign names using Naming Convention

Document in metadata Registry

Identify AnomaliesResolve and Standardize

Integrate amp CollateExtract amp Collect

Regu

latio

ns

Po

licie

s

Documents

Forms

Extract amp Collect

Collected metadata includes

Assigned identifier

Data element name

Related business terms

Definition

Data type

Length

Business rules

IssuesComments

Reference domain

Standard name

Authoritative sources

Lineage

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

13

Data Element

Type

FirstName VARCHAR(35)

LastName VARCHAR(40)

SSN CHAR(11)

Telephone VARCHAR(20)

Data Element

Type

First VARCHAR(25)

Middle VARCHAR(25)

Last VARCHAR(30)

SocialSec CHAR(9)

Understanding Reference Data - Assessment

copy 2013 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

14

Jean Montard 0 062672

Michael Evans 0 112168

Fran Peterson 1 030276

Pat Lawson 1 041779

J Montard M 062672

M Evans M 112168

F Peterson F 030276

P Lawson F 041779

bull Each of these data sets have matching records for unique individuals

bull Each has a code value with a 111 correspondence

bull Understand why the values differ in each data set

Jean J Montard F 062672

Michael D Evans F 112168

Fran S Peterson M 030276

Pat O Lawson M 041779

15

Integrate amp Collate

CORP-

QWODR-25

Quarterly Wage

Employee Wage

Amount

11 AN This field will contain the

information as provided from

the Quarterly Wage record

submitted for State Filing

PROPOSED State

Corporate

Quarterly Wage

OUTPUT DETAIL

RECORD

EMPR-385 WAGE AMOUNT The amount of a

personrsquos wages

during a

Reporting

Quarter

11 Signed

Numeric

00000000000 through

99999999999 The last two

positions are implied to be to

the right of the decimal point

Conditional for the following

output record Federal

Employee Locate Response

Record

Federal Match

System

ID Data Element

Name

Definition L T Data Element Rules IssuesCom

ments

Mapping Authoritative

Source

CORP-

QWODR-25

Quarterly Wage

Employee Wage

Amount

11 AN This field will contain the

information as provided from

the Quarterly Wage record

submitted for State Filing

PROPOSED State

Corporate

Quarterly Wage

OUTPUT DETAIL

RECORD

EMPR-385 WAGE AMOUNT The amount of a

personrsquos wages

during a

Reporting

Quarter

11 Signed

Numeric

00000000000 through

99999999999 The last two

positions are implied to be to

the right of the decimal point

Conditional for the following

output record Federal

Employee Locate Response

Record

Federal Match

System

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

Identify Anomalies

Inconsistency or ambiguity for similarly-named data elements

Inconsistency of explicit data element business rules

Incomplete or inconsistent reference value domains

Inconsistent formats

Conflicting data types

Abbreviations vs full names

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

16

ID Data Element

Name

Definition L T Data Element Rules Mapping

SWA-UI-

OD-15

Claimant State Lacking definition 2 AN If present this field will contain the Claimant

State code as provided on the submitted UI

record

State UI Output Detail

Record

IRS-DME-

15

Sex Lacking definition 1 AN Sex Code from the Person Table will contain

spaces if not present on the Person Table

Values not specified

IRS DATA MATCH

EXTRACT RECORD

FCR-69 Benefit Amount The monetary amount of Unemployment

Insurance benefits a person received during

a Reporting Period This definition

does not specify the Reporting

Period SWA specify quarter

[see below]

11 AN 00000000000 through 99999999999 The last

two positions are implied to be to

the right of the decimal point This field

will contain all zeroes when there is no Benefit

Amount or the information is not available

Conditional for the following output record bull

Federal Match Record

SWA-OD-

19

Benefit Amount This field will contain the gross amount of UI

benefits prior to any deductions paid to a

claimant during the reporting quarter as

provided on the UI record submitted to the

NDNH

11 AN Values are 00000000000 through 99999999999

without decimal

This field is whole dollars only

Potential conflict with FCR-69

SWA UI Output Detail

Record

Resolve amp Standardize

Identify authoritative

sources

Prioritize potential

harmonized definitions

Review with subject matter experts

Consolidate if possible

Differentiate if necessary

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

17

Isomorphic Domains

We can say that value domains A and B are isomorphic if

The cardinality of A is equal to the cardinality of B (they have the same number of values)

Both A and B are associated with a conceptual domain C with the same cardinality as A and B

There is an enumeration of value meanings in A that is a 1-1 mapping to concepts in C

There is an enumeration of value meanings in B that is a 1-1 mapping to concepts in C

In other words if two value domains have a one-to-one mapping to the same conceptual domain values they are isomorphic

Isomorphic domains can be harmonized

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

18

Domain Congruence

Two value domains are congruent if their value sets intersect and the size of the intersection meets or exceeds a predefined threshold

Example

FIPS 2-Character State Codes contain values for all US States

USPS 2-Character State Codes contain values for all US States as well as AA AE and AP postal codes for military delivery

Under certain circumstances the two domains can be harmonized

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

19

Encouraging a Culture of Semantic Harmony

Small variance in definitions in isolated functions become magnified when data is shared across functions

Establish a level playing ground by

Instituting a common business term glossary

Harmonizing business term definitions

Unifying shared reference data into conceptual domains and corresponding value domains

Socializing use of shared metadata

Establishing standards for future development

Integrate methods for monitoring compliance with standards

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

20

Check Out These Resources

wwwknowledge-integritycom

wwwdataqualitybookcom

If you have questions comments or suggestions please contact me

David Loshin

301-754-6350

loshinknowledge-integritycom

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301) 754-6350

21

Page 10: Metadata Melodies Webinar with David Loshin Presentation

Enumeration of Data Elements

Each data element concept is manifested as one or more data elements in a specific system

The template is used to map data element concepts to used data elements

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

10

Data Element Identifier

NameData

Element Concept

UsageConceptual

DomainValue

DomainStorage

Data TypePresentation

Data TypeUnit of

MeasureBusiness

Rules

20988Customer State State Salesforcecom CST VST-1 Char(2) Char(2) NA

May not be null

Business Terms and Data Element Concepts

Map use of a business term to a definition then to the entity or characteristic

Customer is used in reference to the customer entity

Account Number is used in reference to an attribute of a customer entity

Need to track list of data element concepts

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

11

Concept ID Concept Business Term Definition ID

16-A334License or Permit Holder Licensee BT-977

16-A334License or Permit Holder Permit Holder BT-983

Data Harmonization

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

12

Identify inconsistent definitions conflicts

in data domains format variations

Extract data definitions from current guidance documents etc and categorize definitions

using standard terminology

Integrate data elements into a single reference

source then combine common

data elements

Identify authoritative source

for definitionsAssign names using Naming Convention

Document in metadata Registry

Identify AnomaliesResolve and Standardize

Integrate amp CollateExtract amp Collect

Regu

latio

ns

Po

licie

s

Documents

Forms

Extract amp Collect

Collected metadata includes

Assigned identifier

Data element name

Related business terms

Definition

Data type

Length

Business rules

IssuesComments

Reference domain

Standard name

Authoritative sources

Lineage

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

13

Data Element

Type

FirstName VARCHAR(35)

LastName VARCHAR(40)

SSN CHAR(11)

Telephone VARCHAR(20)

Data Element

Type

First VARCHAR(25)

Middle VARCHAR(25)

Last VARCHAR(30)

SocialSec CHAR(9)

Understanding Reference Data - Assessment

copy 2013 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

14

Jean Montard 0 062672

Michael Evans 0 112168

Fran Peterson 1 030276

Pat Lawson 1 041779

J Montard M 062672

M Evans M 112168

F Peterson F 030276

P Lawson F 041779

bull Each of these data sets have matching records for unique individuals

bull Each has a code value with a 111 correspondence

bull Understand why the values differ in each data set

Jean J Montard F 062672

Michael D Evans F 112168

Fran S Peterson M 030276

Pat O Lawson M 041779

15

Integrate amp Collate

CORP-

QWODR-25

Quarterly Wage

Employee Wage

Amount

11 AN This field will contain the

information as provided from

the Quarterly Wage record

submitted for State Filing

PROPOSED State

Corporate

Quarterly Wage

OUTPUT DETAIL

RECORD

EMPR-385 WAGE AMOUNT The amount of a

personrsquos wages

during a

Reporting

Quarter

11 Signed

Numeric

00000000000 through

99999999999 The last two

positions are implied to be to

the right of the decimal point

Conditional for the following

output record Federal

Employee Locate Response

Record

Federal Match

System

ID Data Element

Name

Definition L T Data Element Rules IssuesCom

ments

Mapping Authoritative

Source

CORP-

QWODR-25

Quarterly Wage

Employee Wage

Amount

11 AN This field will contain the

information as provided from

the Quarterly Wage record

submitted for State Filing

PROPOSED State

Corporate

Quarterly Wage

OUTPUT DETAIL

RECORD

EMPR-385 WAGE AMOUNT The amount of a

personrsquos wages

during a

Reporting

Quarter

11 Signed

Numeric

00000000000 through

99999999999 The last two

positions are implied to be to

the right of the decimal point

Conditional for the following

output record Federal

Employee Locate Response

Record

Federal Match

System

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

Identify Anomalies

Inconsistency or ambiguity for similarly-named data elements

Inconsistency of explicit data element business rules

Incomplete or inconsistent reference value domains

Inconsistent formats

Conflicting data types

Abbreviations vs full names

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

16

ID Data Element

Name

Definition L T Data Element Rules Mapping

SWA-UI-

OD-15

Claimant State Lacking definition 2 AN If present this field will contain the Claimant

State code as provided on the submitted UI

record

State UI Output Detail

Record

IRS-DME-

15

Sex Lacking definition 1 AN Sex Code from the Person Table will contain

spaces if not present on the Person Table

Values not specified

IRS DATA MATCH

EXTRACT RECORD

FCR-69 Benefit Amount The monetary amount of Unemployment

Insurance benefits a person received during

a Reporting Period This definition

does not specify the Reporting

Period SWA specify quarter

[see below]

11 AN 00000000000 through 99999999999 The last

two positions are implied to be to

the right of the decimal point This field

will contain all zeroes when there is no Benefit

Amount or the information is not available

Conditional for the following output record bull

Federal Match Record

SWA-OD-

19

Benefit Amount This field will contain the gross amount of UI

benefits prior to any deductions paid to a

claimant during the reporting quarter as

provided on the UI record submitted to the

NDNH

11 AN Values are 00000000000 through 99999999999

without decimal

This field is whole dollars only

Potential conflict with FCR-69

SWA UI Output Detail

Record

Resolve amp Standardize

Identify authoritative

sources

Prioritize potential

harmonized definitions

Review with subject matter experts

Consolidate if possible

Differentiate if necessary

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

17

Isomorphic Domains

We can say that value domains A and B are isomorphic if

The cardinality of A is equal to the cardinality of B (they have the same number of values)

Both A and B are associated with a conceptual domain C with the same cardinality as A and B

There is an enumeration of value meanings in A that is a 1-1 mapping to concepts in C

There is an enumeration of value meanings in B that is a 1-1 mapping to concepts in C

In other words if two value domains have a one-to-one mapping to the same conceptual domain values they are isomorphic

Isomorphic domains can be harmonized

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

18

Domain Congruence

Two value domains are congruent if their value sets intersect and the size of the intersection meets or exceeds a predefined threshold

Example

FIPS 2-Character State Codes contain values for all US States

USPS 2-Character State Codes contain values for all US States as well as AA AE and AP postal codes for military delivery

Under certain circumstances the two domains can be harmonized

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

19

Encouraging a Culture of Semantic Harmony

Small variance in definitions in isolated functions become magnified when data is shared across functions

Establish a level playing ground by

Instituting a common business term glossary

Harmonizing business term definitions

Unifying shared reference data into conceptual domains and corresponding value domains

Socializing use of shared metadata

Establishing standards for future development

Integrate methods for monitoring compliance with standards

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

20

Check Out These Resources

wwwknowledge-integritycom

wwwdataqualitybookcom

If you have questions comments or suggestions please contact me

David Loshin

301-754-6350

loshinknowledge-integritycom

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301) 754-6350

21

Page 11: Metadata Melodies Webinar with David Loshin Presentation

Business Terms and Data Element Concepts

Map use of a business term to a definition then to the entity or characteristic

Customer is used in reference to the customer entity

Account Number is used in reference to an attribute of a customer entity

Need to track list of data element concepts

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

11

Concept ID Concept Business Term Definition ID

16-A334License or Permit Holder Licensee BT-977

16-A334License or Permit Holder Permit Holder BT-983

Data Harmonization

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

12

Identify inconsistent definitions conflicts

in data domains format variations

Extract data definitions from current guidance documents etc and categorize definitions

using standard terminology

Integrate data elements into a single reference

source then combine common

data elements

Identify authoritative source

for definitionsAssign names using Naming Convention

Document in metadata Registry

Identify AnomaliesResolve and Standardize

Integrate amp CollateExtract amp Collect

Regu

latio

ns

Po

licie

s

Documents

Forms

Extract amp Collect

Collected metadata includes

Assigned identifier

Data element name

Related business terms

Definition

Data type

Length

Business rules

IssuesComments

Reference domain

Standard name

Authoritative sources

Lineage

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

13

Data Element

Type

FirstName VARCHAR(35)

LastName VARCHAR(40)

SSN CHAR(11)

Telephone VARCHAR(20)

Data Element

Type

First VARCHAR(25)

Middle VARCHAR(25)

Last VARCHAR(30)

SocialSec CHAR(9)

Understanding Reference Data - Assessment

copy 2013 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

14

Jean Montard 0 062672

Michael Evans 0 112168

Fran Peterson 1 030276

Pat Lawson 1 041779

J Montard M 062672

M Evans M 112168

F Peterson F 030276

P Lawson F 041779

bull Each of these data sets have matching records for unique individuals

bull Each has a code value with a 111 correspondence

bull Understand why the values differ in each data set

Jean J Montard F 062672

Michael D Evans F 112168

Fran S Peterson M 030276

Pat O Lawson M 041779

15

Integrate amp Collate

CORP-

QWODR-25

Quarterly Wage

Employee Wage

Amount

11 AN This field will contain the

information as provided from

the Quarterly Wage record

submitted for State Filing

PROPOSED State

Corporate

Quarterly Wage

OUTPUT DETAIL

RECORD

EMPR-385 WAGE AMOUNT The amount of a

personrsquos wages

during a

Reporting

Quarter

11 Signed

Numeric

00000000000 through

99999999999 The last two

positions are implied to be to

the right of the decimal point

Conditional for the following

output record Federal

Employee Locate Response

Record

Federal Match

System

ID Data Element

Name

Definition L T Data Element Rules IssuesCom

ments

Mapping Authoritative

Source

CORP-

QWODR-25

Quarterly Wage

Employee Wage

Amount

11 AN This field will contain the

information as provided from

the Quarterly Wage record

submitted for State Filing

PROPOSED State

Corporate

Quarterly Wage

OUTPUT DETAIL

RECORD

EMPR-385 WAGE AMOUNT The amount of a

personrsquos wages

during a

Reporting

Quarter

11 Signed

Numeric

00000000000 through

99999999999 The last two

positions are implied to be to

the right of the decimal point

Conditional for the following

output record Federal

Employee Locate Response

Record

Federal Match

System

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

Identify Anomalies

Inconsistency or ambiguity for similarly-named data elements

Inconsistency of explicit data element business rules

Incomplete or inconsistent reference value domains

Inconsistent formats

Conflicting data types

Abbreviations vs full names

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

16

ID Data Element

Name

Definition L T Data Element Rules Mapping

SWA-UI-

OD-15

Claimant State Lacking definition 2 AN If present this field will contain the Claimant

State code as provided on the submitted UI

record

State UI Output Detail

Record

IRS-DME-

15

Sex Lacking definition 1 AN Sex Code from the Person Table will contain

spaces if not present on the Person Table

Values not specified

IRS DATA MATCH

EXTRACT RECORD

FCR-69 Benefit Amount The monetary amount of Unemployment

Insurance benefits a person received during

a Reporting Period This definition

does not specify the Reporting

Period SWA specify quarter

[see below]

11 AN 00000000000 through 99999999999 The last

two positions are implied to be to

the right of the decimal point This field

will contain all zeroes when there is no Benefit

Amount or the information is not available

Conditional for the following output record bull

Federal Match Record

SWA-OD-

19

Benefit Amount This field will contain the gross amount of UI

benefits prior to any deductions paid to a

claimant during the reporting quarter as

provided on the UI record submitted to the

NDNH

11 AN Values are 00000000000 through 99999999999

without decimal

This field is whole dollars only

Potential conflict with FCR-69

SWA UI Output Detail

Record

Resolve amp Standardize

Identify authoritative

sources

Prioritize potential

harmonized definitions

Review with subject matter experts

Consolidate if possible

Differentiate if necessary

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

17

Isomorphic Domains

We can say that value domains A and B are isomorphic if

The cardinality of A is equal to the cardinality of B (they have the same number of values)

Both A and B are associated with a conceptual domain C with the same cardinality as A and B

There is an enumeration of value meanings in A that is a 1-1 mapping to concepts in C

There is an enumeration of value meanings in B that is a 1-1 mapping to concepts in C

In other words if two value domains have a one-to-one mapping to the same conceptual domain values they are isomorphic

Isomorphic domains can be harmonized

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

18

Domain Congruence

Two value domains are congruent if their value sets intersect and the size of the intersection meets or exceeds a predefined threshold

Example

FIPS 2-Character State Codes contain values for all US States

USPS 2-Character State Codes contain values for all US States as well as AA AE and AP postal codes for military delivery

Under certain circumstances the two domains can be harmonized

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

19

Encouraging a Culture of Semantic Harmony

Small variance in definitions in isolated functions become magnified when data is shared across functions

Establish a level playing ground by

Instituting a common business term glossary

Harmonizing business term definitions

Unifying shared reference data into conceptual domains and corresponding value domains

Socializing use of shared metadata

Establishing standards for future development

Integrate methods for monitoring compliance with standards

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

20

Check Out These Resources

wwwknowledge-integritycom

wwwdataqualitybookcom

If you have questions comments or suggestions please contact me

David Loshin

301-754-6350

loshinknowledge-integritycom

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301) 754-6350

21

Page 12: Metadata Melodies Webinar with David Loshin Presentation

Data Harmonization

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

12

Identify inconsistent definitions conflicts

in data domains format variations

Extract data definitions from current guidance documents etc and categorize definitions

using standard terminology

Integrate data elements into a single reference

source then combine common

data elements

Identify authoritative source

for definitionsAssign names using Naming Convention

Document in metadata Registry

Identify AnomaliesResolve and Standardize

Integrate amp CollateExtract amp Collect

Regu

latio

ns

Po

licie

s

Documents

Forms

Extract amp Collect

Collected metadata includes

Assigned identifier

Data element name

Related business terms

Definition

Data type

Length

Business rules

IssuesComments

Reference domain

Standard name

Authoritative sources

Lineage

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

13

Data Element

Type

FirstName VARCHAR(35)

LastName VARCHAR(40)

SSN CHAR(11)

Telephone VARCHAR(20)

Data Element

Type

First VARCHAR(25)

Middle VARCHAR(25)

Last VARCHAR(30)

SocialSec CHAR(9)

Understanding Reference Data - Assessment

copy 2013 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

14

Jean Montard 0 062672

Michael Evans 0 112168

Fran Peterson 1 030276

Pat Lawson 1 041779

J Montard M 062672

M Evans M 112168

F Peterson F 030276

P Lawson F 041779

bull Each of these data sets have matching records for unique individuals

bull Each has a code value with a 111 correspondence

bull Understand why the values differ in each data set

Jean J Montard F 062672

Michael D Evans F 112168

Fran S Peterson M 030276

Pat O Lawson M 041779

15

Integrate amp Collate

CORP-

QWODR-25

Quarterly Wage

Employee Wage

Amount

11 AN This field will contain the

information as provided from

the Quarterly Wage record

submitted for State Filing

PROPOSED State

Corporate

Quarterly Wage

OUTPUT DETAIL

RECORD

EMPR-385 WAGE AMOUNT The amount of a

personrsquos wages

during a

Reporting

Quarter

11 Signed

Numeric

00000000000 through

99999999999 The last two

positions are implied to be to

the right of the decimal point

Conditional for the following

output record Federal

Employee Locate Response

Record

Federal Match

System

ID Data Element

Name

Definition L T Data Element Rules IssuesCom

ments

Mapping Authoritative

Source

CORP-

QWODR-25

Quarterly Wage

Employee Wage

Amount

11 AN This field will contain the

information as provided from

the Quarterly Wage record

submitted for State Filing

PROPOSED State

Corporate

Quarterly Wage

OUTPUT DETAIL

RECORD

EMPR-385 WAGE AMOUNT The amount of a

personrsquos wages

during a

Reporting

Quarter

11 Signed

Numeric

00000000000 through

99999999999 The last two

positions are implied to be to

the right of the decimal point

Conditional for the following

output record Federal

Employee Locate Response

Record

Federal Match

System

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

Identify Anomalies

Inconsistency or ambiguity for similarly-named data elements

Inconsistency of explicit data element business rules

Incomplete or inconsistent reference value domains

Inconsistent formats

Conflicting data types

Abbreviations vs full names

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

16

ID Data Element

Name

Definition L T Data Element Rules Mapping

SWA-UI-

OD-15

Claimant State Lacking definition 2 AN If present this field will contain the Claimant

State code as provided on the submitted UI

record

State UI Output Detail

Record

IRS-DME-

15

Sex Lacking definition 1 AN Sex Code from the Person Table will contain

spaces if not present on the Person Table

Values not specified

IRS DATA MATCH

EXTRACT RECORD

FCR-69 Benefit Amount The monetary amount of Unemployment

Insurance benefits a person received during

a Reporting Period This definition

does not specify the Reporting

Period SWA specify quarter

[see below]

11 AN 00000000000 through 99999999999 The last

two positions are implied to be to

the right of the decimal point This field

will contain all zeroes when there is no Benefit

Amount or the information is not available

Conditional for the following output record bull

Federal Match Record

SWA-OD-

19

Benefit Amount This field will contain the gross amount of UI

benefits prior to any deductions paid to a

claimant during the reporting quarter as

provided on the UI record submitted to the

NDNH

11 AN Values are 00000000000 through 99999999999

without decimal

This field is whole dollars only

Potential conflict with FCR-69

SWA UI Output Detail

Record

Resolve amp Standardize

Identify authoritative

sources

Prioritize potential

harmonized definitions

Review with subject matter experts

Consolidate if possible

Differentiate if necessary

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

17

Isomorphic Domains

We can say that value domains A and B are isomorphic if

The cardinality of A is equal to the cardinality of B (they have the same number of values)

Both A and B are associated with a conceptual domain C with the same cardinality as A and B

There is an enumeration of value meanings in A that is a 1-1 mapping to concepts in C

There is an enumeration of value meanings in B that is a 1-1 mapping to concepts in C

In other words if two value domains have a one-to-one mapping to the same conceptual domain values they are isomorphic

Isomorphic domains can be harmonized

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

18

Domain Congruence

Two value domains are congruent if their value sets intersect and the size of the intersection meets or exceeds a predefined threshold

Example

FIPS 2-Character State Codes contain values for all US States

USPS 2-Character State Codes contain values for all US States as well as AA AE and AP postal codes for military delivery

Under certain circumstances the two domains can be harmonized

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

19

Encouraging a Culture of Semantic Harmony

Small variance in definitions in isolated functions become magnified when data is shared across functions

Establish a level playing ground by

Instituting a common business term glossary

Harmonizing business term definitions

Unifying shared reference data into conceptual domains and corresponding value domains

Socializing use of shared metadata

Establishing standards for future development

Integrate methods for monitoring compliance with standards

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

20

Check Out These Resources

wwwknowledge-integritycom

wwwdataqualitybookcom

If you have questions comments or suggestions please contact me

David Loshin

301-754-6350

loshinknowledge-integritycom

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301) 754-6350

21

Page 13: Metadata Melodies Webinar with David Loshin Presentation

Extract amp Collect

Collected metadata includes

Assigned identifier

Data element name

Related business terms

Definition

Data type

Length

Business rules

IssuesComments

Reference domain

Standard name

Authoritative sources

Lineage

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

13

Data Element

Type

FirstName VARCHAR(35)

LastName VARCHAR(40)

SSN CHAR(11)

Telephone VARCHAR(20)

Data Element

Type

First VARCHAR(25)

Middle VARCHAR(25)

Last VARCHAR(30)

SocialSec CHAR(9)

Understanding Reference Data - Assessment

copy 2013 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

14

Jean Montard 0 062672

Michael Evans 0 112168

Fran Peterson 1 030276

Pat Lawson 1 041779

J Montard M 062672

M Evans M 112168

F Peterson F 030276

P Lawson F 041779

bull Each of these data sets have matching records for unique individuals

bull Each has a code value with a 111 correspondence

bull Understand why the values differ in each data set

Jean J Montard F 062672

Michael D Evans F 112168

Fran S Peterson M 030276

Pat O Lawson M 041779

15

Integrate amp Collate

CORP-

QWODR-25

Quarterly Wage

Employee Wage

Amount

11 AN This field will contain the

information as provided from

the Quarterly Wage record

submitted for State Filing

PROPOSED State

Corporate

Quarterly Wage

OUTPUT DETAIL

RECORD

EMPR-385 WAGE AMOUNT The amount of a

personrsquos wages

during a

Reporting

Quarter

11 Signed

Numeric

00000000000 through

99999999999 The last two

positions are implied to be to

the right of the decimal point

Conditional for the following

output record Federal

Employee Locate Response

Record

Federal Match

System

ID Data Element

Name

Definition L T Data Element Rules IssuesCom

ments

Mapping Authoritative

Source

CORP-

QWODR-25

Quarterly Wage

Employee Wage

Amount

11 AN This field will contain the

information as provided from

the Quarterly Wage record

submitted for State Filing

PROPOSED State

Corporate

Quarterly Wage

OUTPUT DETAIL

RECORD

EMPR-385 WAGE AMOUNT The amount of a

personrsquos wages

during a

Reporting

Quarter

11 Signed

Numeric

00000000000 through

99999999999 The last two

positions are implied to be to

the right of the decimal point

Conditional for the following

output record Federal

Employee Locate Response

Record

Federal Match

System

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

Identify Anomalies

Inconsistency or ambiguity for similarly-named data elements

Inconsistency of explicit data element business rules

Incomplete or inconsistent reference value domains

Inconsistent formats

Conflicting data types

Abbreviations vs full names

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

16

ID Data Element

Name

Definition L T Data Element Rules Mapping

SWA-UI-

OD-15

Claimant State Lacking definition 2 AN If present this field will contain the Claimant

State code as provided on the submitted UI

record

State UI Output Detail

Record

IRS-DME-

15

Sex Lacking definition 1 AN Sex Code from the Person Table will contain

spaces if not present on the Person Table

Values not specified

IRS DATA MATCH

EXTRACT RECORD

FCR-69 Benefit Amount The monetary amount of Unemployment

Insurance benefits a person received during

a Reporting Period This definition

does not specify the Reporting

Period SWA specify quarter

[see below]

11 AN 00000000000 through 99999999999 The last

two positions are implied to be to

the right of the decimal point This field

will contain all zeroes when there is no Benefit

Amount or the information is not available

Conditional for the following output record bull

Federal Match Record

SWA-OD-

19

Benefit Amount This field will contain the gross amount of UI

benefits prior to any deductions paid to a

claimant during the reporting quarter as

provided on the UI record submitted to the

NDNH

11 AN Values are 00000000000 through 99999999999

without decimal

This field is whole dollars only

Potential conflict with FCR-69

SWA UI Output Detail

Record

Resolve amp Standardize

Identify authoritative

sources

Prioritize potential

harmonized definitions

Review with subject matter experts

Consolidate if possible

Differentiate if necessary

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

17

Isomorphic Domains

We can say that value domains A and B are isomorphic if

The cardinality of A is equal to the cardinality of B (they have the same number of values)

Both A and B are associated with a conceptual domain C with the same cardinality as A and B

There is an enumeration of value meanings in A that is a 1-1 mapping to concepts in C

There is an enumeration of value meanings in B that is a 1-1 mapping to concepts in C

In other words if two value domains have a one-to-one mapping to the same conceptual domain values they are isomorphic

Isomorphic domains can be harmonized

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

18

Domain Congruence

Two value domains are congruent if their value sets intersect and the size of the intersection meets or exceeds a predefined threshold

Example

FIPS 2-Character State Codes contain values for all US States

USPS 2-Character State Codes contain values for all US States as well as AA AE and AP postal codes for military delivery

Under certain circumstances the two domains can be harmonized

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

19

Encouraging a Culture of Semantic Harmony

Small variance in definitions in isolated functions become magnified when data is shared across functions

Establish a level playing ground by

Instituting a common business term glossary

Harmonizing business term definitions

Unifying shared reference data into conceptual domains and corresponding value domains

Socializing use of shared metadata

Establishing standards for future development

Integrate methods for monitoring compliance with standards

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

20

Check Out These Resources

wwwknowledge-integritycom

wwwdataqualitybookcom

If you have questions comments or suggestions please contact me

David Loshin

301-754-6350

loshinknowledge-integritycom

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301) 754-6350

21

Page 14: Metadata Melodies Webinar with David Loshin Presentation

Understanding Reference Data - Assessment

copy 2013 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

14

Jean Montard 0 062672

Michael Evans 0 112168

Fran Peterson 1 030276

Pat Lawson 1 041779

J Montard M 062672

M Evans M 112168

F Peterson F 030276

P Lawson F 041779

bull Each of these data sets have matching records for unique individuals

bull Each has a code value with a 111 correspondence

bull Understand why the values differ in each data set

Jean J Montard F 062672

Michael D Evans F 112168

Fran S Peterson M 030276

Pat O Lawson M 041779

15

Integrate amp Collate

CORP-

QWODR-25

Quarterly Wage

Employee Wage

Amount

11 AN This field will contain the

information as provided from

the Quarterly Wage record

submitted for State Filing

PROPOSED State

Corporate

Quarterly Wage

OUTPUT DETAIL

RECORD

EMPR-385 WAGE AMOUNT The amount of a

personrsquos wages

during a

Reporting

Quarter

11 Signed

Numeric

00000000000 through

99999999999 The last two

positions are implied to be to

the right of the decimal point

Conditional for the following

output record Federal

Employee Locate Response

Record

Federal Match

System

ID Data Element

Name

Definition L T Data Element Rules IssuesCom

ments

Mapping Authoritative

Source

CORP-

QWODR-25

Quarterly Wage

Employee Wage

Amount

11 AN This field will contain the

information as provided from

the Quarterly Wage record

submitted for State Filing

PROPOSED State

Corporate

Quarterly Wage

OUTPUT DETAIL

RECORD

EMPR-385 WAGE AMOUNT The amount of a

personrsquos wages

during a

Reporting

Quarter

11 Signed

Numeric

00000000000 through

99999999999 The last two

positions are implied to be to

the right of the decimal point

Conditional for the following

output record Federal

Employee Locate Response

Record

Federal Match

System

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

Identify Anomalies

Inconsistency or ambiguity for similarly-named data elements

Inconsistency of explicit data element business rules

Incomplete or inconsistent reference value domains

Inconsistent formats

Conflicting data types

Abbreviations vs full names

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

16

ID Data Element

Name

Definition L T Data Element Rules Mapping

SWA-UI-

OD-15

Claimant State Lacking definition 2 AN If present this field will contain the Claimant

State code as provided on the submitted UI

record

State UI Output Detail

Record

IRS-DME-

15

Sex Lacking definition 1 AN Sex Code from the Person Table will contain

spaces if not present on the Person Table

Values not specified

IRS DATA MATCH

EXTRACT RECORD

FCR-69 Benefit Amount The monetary amount of Unemployment

Insurance benefits a person received during

a Reporting Period This definition

does not specify the Reporting

Period SWA specify quarter

[see below]

11 AN 00000000000 through 99999999999 The last

two positions are implied to be to

the right of the decimal point This field

will contain all zeroes when there is no Benefit

Amount or the information is not available

Conditional for the following output record bull

Federal Match Record

SWA-OD-

19

Benefit Amount This field will contain the gross amount of UI

benefits prior to any deductions paid to a

claimant during the reporting quarter as

provided on the UI record submitted to the

NDNH

11 AN Values are 00000000000 through 99999999999

without decimal

This field is whole dollars only

Potential conflict with FCR-69

SWA UI Output Detail

Record

Resolve amp Standardize

Identify authoritative

sources

Prioritize potential

harmonized definitions

Review with subject matter experts

Consolidate if possible

Differentiate if necessary

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

17

Isomorphic Domains

We can say that value domains A and B are isomorphic if

The cardinality of A is equal to the cardinality of B (they have the same number of values)

Both A and B are associated with a conceptual domain C with the same cardinality as A and B

There is an enumeration of value meanings in A that is a 1-1 mapping to concepts in C

There is an enumeration of value meanings in B that is a 1-1 mapping to concepts in C

In other words if two value domains have a one-to-one mapping to the same conceptual domain values they are isomorphic

Isomorphic domains can be harmonized

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

18

Domain Congruence

Two value domains are congruent if their value sets intersect and the size of the intersection meets or exceeds a predefined threshold

Example

FIPS 2-Character State Codes contain values for all US States

USPS 2-Character State Codes contain values for all US States as well as AA AE and AP postal codes for military delivery

Under certain circumstances the two domains can be harmonized

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

19

Encouraging a Culture of Semantic Harmony

Small variance in definitions in isolated functions become magnified when data is shared across functions

Establish a level playing ground by

Instituting a common business term glossary

Harmonizing business term definitions

Unifying shared reference data into conceptual domains and corresponding value domains

Socializing use of shared metadata

Establishing standards for future development

Integrate methods for monitoring compliance with standards

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

20

Check Out These Resources

wwwknowledge-integritycom

wwwdataqualitybookcom

If you have questions comments or suggestions please contact me

David Loshin

301-754-6350

loshinknowledge-integritycom

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301) 754-6350

21

Page 15: Metadata Melodies Webinar with David Loshin Presentation

15

Integrate amp Collate

CORP-

QWODR-25

Quarterly Wage

Employee Wage

Amount

11 AN This field will contain the

information as provided from

the Quarterly Wage record

submitted for State Filing

PROPOSED State

Corporate

Quarterly Wage

OUTPUT DETAIL

RECORD

EMPR-385 WAGE AMOUNT The amount of a

personrsquos wages

during a

Reporting

Quarter

11 Signed

Numeric

00000000000 through

99999999999 The last two

positions are implied to be to

the right of the decimal point

Conditional for the following

output record Federal

Employee Locate Response

Record

Federal Match

System

ID Data Element

Name

Definition L T Data Element Rules IssuesCom

ments

Mapping Authoritative

Source

CORP-

QWODR-25

Quarterly Wage

Employee Wage

Amount

11 AN This field will contain the

information as provided from

the Quarterly Wage record

submitted for State Filing

PROPOSED State

Corporate

Quarterly Wage

OUTPUT DETAIL

RECORD

EMPR-385 WAGE AMOUNT The amount of a

personrsquos wages

during a

Reporting

Quarter

11 Signed

Numeric

00000000000 through

99999999999 The last two

positions are implied to be to

the right of the decimal point

Conditional for the following

output record Federal

Employee Locate Response

Record

Federal Match

System

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

Identify Anomalies

Inconsistency or ambiguity for similarly-named data elements

Inconsistency of explicit data element business rules

Incomplete or inconsistent reference value domains

Inconsistent formats

Conflicting data types

Abbreviations vs full names

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

16

ID Data Element

Name

Definition L T Data Element Rules Mapping

SWA-UI-

OD-15

Claimant State Lacking definition 2 AN If present this field will contain the Claimant

State code as provided on the submitted UI

record

State UI Output Detail

Record

IRS-DME-

15

Sex Lacking definition 1 AN Sex Code from the Person Table will contain

spaces if not present on the Person Table

Values not specified

IRS DATA MATCH

EXTRACT RECORD

FCR-69 Benefit Amount The monetary amount of Unemployment

Insurance benefits a person received during

a Reporting Period This definition

does not specify the Reporting

Period SWA specify quarter

[see below]

11 AN 00000000000 through 99999999999 The last

two positions are implied to be to

the right of the decimal point This field

will contain all zeroes when there is no Benefit

Amount or the information is not available

Conditional for the following output record bull

Federal Match Record

SWA-OD-

19

Benefit Amount This field will contain the gross amount of UI

benefits prior to any deductions paid to a

claimant during the reporting quarter as

provided on the UI record submitted to the

NDNH

11 AN Values are 00000000000 through 99999999999

without decimal

This field is whole dollars only

Potential conflict with FCR-69

SWA UI Output Detail

Record

Resolve amp Standardize

Identify authoritative

sources

Prioritize potential

harmonized definitions

Review with subject matter experts

Consolidate if possible

Differentiate if necessary

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

17

Isomorphic Domains

We can say that value domains A and B are isomorphic if

The cardinality of A is equal to the cardinality of B (they have the same number of values)

Both A and B are associated with a conceptual domain C with the same cardinality as A and B

There is an enumeration of value meanings in A that is a 1-1 mapping to concepts in C

There is an enumeration of value meanings in B that is a 1-1 mapping to concepts in C

In other words if two value domains have a one-to-one mapping to the same conceptual domain values they are isomorphic

Isomorphic domains can be harmonized

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

18

Domain Congruence

Two value domains are congruent if their value sets intersect and the size of the intersection meets or exceeds a predefined threshold

Example

FIPS 2-Character State Codes contain values for all US States

USPS 2-Character State Codes contain values for all US States as well as AA AE and AP postal codes for military delivery

Under certain circumstances the two domains can be harmonized

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

19

Encouraging a Culture of Semantic Harmony

Small variance in definitions in isolated functions become magnified when data is shared across functions

Establish a level playing ground by

Instituting a common business term glossary

Harmonizing business term definitions

Unifying shared reference data into conceptual domains and corresponding value domains

Socializing use of shared metadata

Establishing standards for future development

Integrate methods for monitoring compliance with standards

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

20

Check Out These Resources

wwwknowledge-integritycom

wwwdataqualitybookcom

If you have questions comments or suggestions please contact me

David Loshin

301-754-6350

loshinknowledge-integritycom

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301) 754-6350

21

Page 16: Metadata Melodies Webinar with David Loshin Presentation

Identify Anomalies

Inconsistency or ambiguity for similarly-named data elements

Inconsistency of explicit data element business rules

Incomplete or inconsistent reference value domains

Inconsistent formats

Conflicting data types

Abbreviations vs full names

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

16

ID Data Element

Name

Definition L T Data Element Rules Mapping

SWA-UI-

OD-15

Claimant State Lacking definition 2 AN If present this field will contain the Claimant

State code as provided on the submitted UI

record

State UI Output Detail

Record

IRS-DME-

15

Sex Lacking definition 1 AN Sex Code from the Person Table will contain

spaces if not present on the Person Table

Values not specified

IRS DATA MATCH

EXTRACT RECORD

FCR-69 Benefit Amount The monetary amount of Unemployment

Insurance benefits a person received during

a Reporting Period This definition

does not specify the Reporting

Period SWA specify quarter

[see below]

11 AN 00000000000 through 99999999999 The last

two positions are implied to be to

the right of the decimal point This field

will contain all zeroes when there is no Benefit

Amount or the information is not available

Conditional for the following output record bull

Federal Match Record

SWA-OD-

19

Benefit Amount This field will contain the gross amount of UI

benefits prior to any deductions paid to a

claimant during the reporting quarter as

provided on the UI record submitted to the

NDNH

11 AN Values are 00000000000 through 99999999999

without decimal

This field is whole dollars only

Potential conflict with FCR-69

SWA UI Output Detail

Record

Resolve amp Standardize

Identify authoritative

sources

Prioritize potential

harmonized definitions

Review with subject matter experts

Consolidate if possible

Differentiate if necessary

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

17

Isomorphic Domains

We can say that value domains A and B are isomorphic if

The cardinality of A is equal to the cardinality of B (they have the same number of values)

Both A and B are associated with a conceptual domain C with the same cardinality as A and B

There is an enumeration of value meanings in A that is a 1-1 mapping to concepts in C

There is an enumeration of value meanings in B that is a 1-1 mapping to concepts in C

In other words if two value domains have a one-to-one mapping to the same conceptual domain values they are isomorphic

Isomorphic domains can be harmonized

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

18

Domain Congruence

Two value domains are congruent if their value sets intersect and the size of the intersection meets or exceeds a predefined threshold

Example

FIPS 2-Character State Codes contain values for all US States

USPS 2-Character State Codes contain values for all US States as well as AA AE and AP postal codes for military delivery

Under certain circumstances the two domains can be harmonized

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

19

Encouraging a Culture of Semantic Harmony

Small variance in definitions in isolated functions become magnified when data is shared across functions

Establish a level playing ground by

Instituting a common business term glossary

Harmonizing business term definitions

Unifying shared reference data into conceptual domains and corresponding value domains

Socializing use of shared metadata

Establishing standards for future development

Integrate methods for monitoring compliance with standards

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

20

Check Out These Resources

wwwknowledge-integritycom

wwwdataqualitybookcom

If you have questions comments or suggestions please contact me

David Loshin

301-754-6350

loshinknowledge-integritycom

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301) 754-6350

21

Page 17: Metadata Melodies Webinar with David Loshin Presentation

Resolve amp Standardize

Identify authoritative

sources

Prioritize potential

harmonized definitions

Review with subject matter experts

Consolidate if possible

Differentiate if necessary

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

17

Isomorphic Domains

We can say that value domains A and B are isomorphic if

The cardinality of A is equal to the cardinality of B (they have the same number of values)

Both A and B are associated with a conceptual domain C with the same cardinality as A and B

There is an enumeration of value meanings in A that is a 1-1 mapping to concepts in C

There is an enumeration of value meanings in B that is a 1-1 mapping to concepts in C

In other words if two value domains have a one-to-one mapping to the same conceptual domain values they are isomorphic

Isomorphic domains can be harmonized

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

18

Domain Congruence

Two value domains are congruent if their value sets intersect and the size of the intersection meets or exceeds a predefined threshold

Example

FIPS 2-Character State Codes contain values for all US States

USPS 2-Character State Codes contain values for all US States as well as AA AE and AP postal codes for military delivery

Under certain circumstances the two domains can be harmonized

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

19

Encouraging a Culture of Semantic Harmony

Small variance in definitions in isolated functions become magnified when data is shared across functions

Establish a level playing ground by

Instituting a common business term glossary

Harmonizing business term definitions

Unifying shared reference data into conceptual domains and corresponding value domains

Socializing use of shared metadata

Establishing standards for future development

Integrate methods for monitoring compliance with standards

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

20

Check Out These Resources

wwwknowledge-integritycom

wwwdataqualitybookcom

If you have questions comments or suggestions please contact me

David Loshin

301-754-6350

loshinknowledge-integritycom

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301) 754-6350

21

Page 18: Metadata Melodies Webinar with David Loshin Presentation

Isomorphic Domains

We can say that value domains A and B are isomorphic if

The cardinality of A is equal to the cardinality of B (they have the same number of values)

Both A and B are associated with a conceptual domain C with the same cardinality as A and B

There is an enumeration of value meanings in A that is a 1-1 mapping to concepts in C

There is an enumeration of value meanings in B that is a 1-1 mapping to concepts in C

In other words if two value domains have a one-to-one mapping to the same conceptual domain values they are isomorphic

Isomorphic domains can be harmonized

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

18

Domain Congruence

Two value domains are congruent if their value sets intersect and the size of the intersection meets or exceeds a predefined threshold

Example

FIPS 2-Character State Codes contain values for all US States

USPS 2-Character State Codes contain values for all US States as well as AA AE and AP postal codes for military delivery

Under certain circumstances the two domains can be harmonized

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

19

Encouraging a Culture of Semantic Harmony

Small variance in definitions in isolated functions become magnified when data is shared across functions

Establish a level playing ground by

Instituting a common business term glossary

Harmonizing business term definitions

Unifying shared reference data into conceptual domains and corresponding value domains

Socializing use of shared metadata

Establishing standards for future development

Integrate methods for monitoring compliance with standards

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

20

Check Out These Resources

wwwknowledge-integritycom

wwwdataqualitybookcom

If you have questions comments or suggestions please contact me

David Loshin

301-754-6350

loshinknowledge-integritycom

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301) 754-6350

21

Page 19: Metadata Melodies Webinar with David Loshin Presentation

Domain Congruence

Two value domains are congruent if their value sets intersect and the size of the intersection meets or exceeds a predefined threshold

Example

FIPS 2-Character State Codes contain values for all US States

USPS 2-Character State Codes contain values for all US States as well as AA AE and AP postal codes for military delivery

Under certain circumstances the two domains can be harmonized

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

19

Encouraging a Culture of Semantic Harmony

Small variance in definitions in isolated functions become magnified when data is shared across functions

Establish a level playing ground by

Instituting a common business term glossary

Harmonizing business term definitions

Unifying shared reference data into conceptual domains and corresponding value domains

Socializing use of shared metadata

Establishing standards for future development

Integrate methods for monitoring compliance with standards

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

20

Check Out These Resources

wwwknowledge-integritycom

wwwdataqualitybookcom

If you have questions comments or suggestions please contact me

David Loshin

301-754-6350

loshinknowledge-integritycom

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301) 754-6350

21

Page 20: Metadata Melodies Webinar with David Loshin Presentation

Encouraging a Culture of Semantic Harmony

Small variance in definitions in isolated functions become magnified when data is shared across functions

Establish a level playing ground by

Instituting a common business term glossary

Harmonizing business term definitions

Unifying shared reference data into conceptual domains and corresponding value domains

Socializing use of shared metadata

Establishing standards for future development

Integrate methods for monitoring compliance with standards

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301)754-6350

20

Check Out These Resources

wwwknowledge-integritycom

wwwdataqualitybookcom

If you have questions comments or suggestions please contact me

David Loshin

301-754-6350

loshinknowledge-integritycom

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301) 754-6350

21

Page 21: Metadata Melodies Webinar with David Loshin Presentation

Check Out These Resources

wwwknowledge-integritycom

wwwdataqualitybookcom

If you have questions comments or suggestions please contact me

David Loshin

301-754-6350

loshinknowledge-integritycom

copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom

(301) 754-6350

21