metadata melodies webinar with david loshin presentation
DESCRIPTION
See the companion webinar at: http://embt.co/1uHXmjv The ever-growing interest in data accumulation from multiple sources and organizations for reporting and analysis exposes a dirty secret: those business terms that we all think we understand actually have a wide variety of definitions. Sometimes these variances are largely irrelevant, and do not significantly impact the ability to create a reasonable report. However, there are some instances in which even minor variations in structure, content, or semantics can have a significant impact in delivering trustworthy results. This leads to the question: if we have two different structures or definitions for what appear to be two similar concepts, should we harmonize the definitions and structures into one? In some cases this will be a good idea, and it will lead to increased consistency, but this is only true as long as the two concepts really refer to the same real-world idea. In other cases, the same terms are used for two different ideas, necessitating a division into two or more qualified business terms and definitions.TRANSCRIPT
Harmonize or Differentiate
David Loshin
Knowledge Integrity Inc
wwwknowledge-integritycom
1copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
Common Business Terms Are They Really Common
copy 2013 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
2
What is a ldquostaterdquo
Challenges in Semantic Consistency
Data definitions are often biased around specific business function requirements
The meaning of a concept may slightly differ from application to application
Consolidation without considering semantics will lead to confusion downstream
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
3
ldquoLocationrdquo
Municipal
Taxation
Mailing
Delivery
Utility
Evolution of the Business Metadata Glossary
Many sources of entity concepts and business terms may conflict with each other
The data steward must facilitate the collection and documentation of business terms
The data steward must also prepare for harmonization of terms
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
4
Policies
System Docs
Processes
Models
Standards
Applications
Business Rules
Profiling
Etc
Entity Concepts
BusinessTerms
DefinitionContextual
Meaning
hellip hellip
DefinitionContextual
Meaning
DefinitionContextual
Meaning
DefinitionContextual
Meaning
Example ndash Identifying Business Terms
Order ConfirmationIf you do not receive a confirmation number (in the form of a confirmation page or email) after submitting payment information or if you experience an error message or service interruption after submitting payment information it is your responsibility to confirm with FizzDizzleCustomer Service whether or not your order has been placed
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
5
bull Youbull Confirmation numberbull Confirmation pagebull Confirmation emailbull Payment informationbull Error messagebull Service interruptionbull FizzDizzle Customer Servicebull Order
Nouns
Example ndash Identifying Business Terms
Order ConfirmationIf you do not receive a confirmation number (in the form of a confirmation page or email) after submitting payment information or if you experience an error message or service interruption after submitting payment information it is your responsibility to confirm with FizzDizzleCustomer Service whether or not your order has been placed
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
6
bull Receivebull Submittingbull Experiencebull Confirmbull Placed
Verbs
Entities amp Characteristics
Entities are core concepts that are mapped to conceptual data domain models such as
Customer Organization Order Product
The conceptual Data Domain is mapped to a container such as
File table object
Characteristics are attributes of entities modeled as data element concepts
Data element concepts are mapped to data elements
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
7
Characteristics
A data element concept has values taken from a conceptual domain
One data element concept might be mapped to more than one instantiation as a data element
A data element has values taken from a value domain
A conceptual domain might be mapped to more than one instantiation as a value domain
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
8
More on Entities amp Characteristics
An entity may have many characteristics
A data domain may refer to many data element concepts
An instantiated container may have many data elements
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
9
Enumeration of Data Elements
Each data element concept is manifested as one or more data elements in a specific system
The template is used to map data element concepts to used data elements
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
10
Data Element Identifier
NameData
Element Concept
UsageConceptual
DomainValue
DomainStorage
Data TypePresentation
Data TypeUnit of
MeasureBusiness
Rules
20988Customer State State Salesforcecom CST VST-1 Char(2) Char(2) NA
May not be null
Business Terms and Data Element Concepts
Map use of a business term to a definition then to the entity or characteristic
Customer is used in reference to the customer entity
Account Number is used in reference to an attribute of a customer entity
Need to track list of data element concepts
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
11
Concept ID Concept Business Term Definition ID
16-A334License or Permit Holder Licensee BT-977
16-A334License or Permit Holder Permit Holder BT-983
Data Harmonization
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
12
Identify inconsistent definitions conflicts
in data domains format variations
Extract data definitions from current guidance documents etc and categorize definitions
using standard terminology
Integrate data elements into a single reference
source then combine common
data elements
Identify authoritative source
for definitionsAssign names using Naming Convention
Document in metadata Registry
Identify AnomaliesResolve and Standardize
Integrate amp CollateExtract amp Collect
Regu
latio
ns
Po
licie
s
Documents
Forms
Extract amp Collect
Collected metadata includes
Assigned identifier
Data element name
Related business terms
Definition
Data type
Length
Business rules
IssuesComments
Reference domain
Standard name
Authoritative sources
Lineage
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
13
Data Element
Type
FirstName VARCHAR(35)
LastName VARCHAR(40)
SSN CHAR(11)
Telephone VARCHAR(20)
Data Element
Type
First VARCHAR(25)
Middle VARCHAR(25)
Last VARCHAR(30)
SocialSec CHAR(9)
Understanding Reference Data - Assessment
copy 2013 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
14
Jean Montard 0 062672
Michael Evans 0 112168
Fran Peterson 1 030276
Pat Lawson 1 041779
J Montard M 062672
M Evans M 112168
F Peterson F 030276
P Lawson F 041779
bull Each of these data sets have matching records for unique individuals
bull Each has a code value with a 111 correspondence
bull Understand why the values differ in each data set
Jean J Montard F 062672
Michael D Evans F 112168
Fran S Peterson M 030276
Pat O Lawson M 041779
15
Integrate amp Collate
CORP-
QWODR-25
Quarterly Wage
Employee Wage
Amount
11 AN This field will contain the
information as provided from
the Quarterly Wage record
submitted for State Filing
PROPOSED State
Corporate
Quarterly Wage
OUTPUT DETAIL
RECORD
EMPR-385 WAGE AMOUNT The amount of a
personrsquos wages
during a
Reporting
Quarter
11 Signed
Numeric
00000000000 through
99999999999 The last two
positions are implied to be to
the right of the decimal point
Conditional for the following
output record Federal
Employee Locate Response
Record
Federal Match
System
ID Data Element
Name
Definition L T Data Element Rules IssuesCom
ments
Mapping Authoritative
Source
CORP-
QWODR-25
Quarterly Wage
Employee Wage
Amount
11 AN This field will contain the
information as provided from
the Quarterly Wage record
submitted for State Filing
PROPOSED State
Corporate
Quarterly Wage
OUTPUT DETAIL
RECORD
EMPR-385 WAGE AMOUNT The amount of a
personrsquos wages
during a
Reporting
Quarter
11 Signed
Numeric
00000000000 through
99999999999 The last two
positions are implied to be to
the right of the decimal point
Conditional for the following
output record Federal
Employee Locate Response
Record
Federal Match
System
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
Identify Anomalies
Inconsistency or ambiguity for similarly-named data elements
Inconsistency of explicit data element business rules
Incomplete or inconsistent reference value domains
Inconsistent formats
Conflicting data types
Abbreviations vs full names
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
16
ID Data Element
Name
Definition L T Data Element Rules Mapping
SWA-UI-
OD-15
Claimant State Lacking definition 2 AN If present this field will contain the Claimant
State code as provided on the submitted UI
record
State UI Output Detail
Record
IRS-DME-
15
Sex Lacking definition 1 AN Sex Code from the Person Table will contain
spaces if not present on the Person Table
Values not specified
IRS DATA MATCH
EXTRACT RECORD
FCR-69 Benefit Amount The monetary amount of Unemployment
Insurance benefits a person received during
a Reporting Period This definition
does not specify the Reporting
Period SWA specify quarter
[see below]
11 AN 00000000000 through 99999999999 The last
two positions are implied to be to
the right of the decimal point This field
will contain all zeroes when there is no Benefit
Amount or the information is not available
Conditional for the following output record bull
Federal Match Record
SWA-OD-
19
Benefit Amount This field will contain the gross amount of UI
benefits prior to any deductions paid to a
claimant during the reporting quarter as
provided on the UI record submitted to the
NDNH
11 AN Values are 00000000000 through 99999999999
without decimal
This field is whole dollars only
Potential conflict with FCR-69
SWA UI Output Detail
Record
Resolve amp Standardize
Identify authoritative
sources
Prioritize potential
harmonized definitions
Review with subject matter experts
Consolidate if possible
Differentiate if necessary
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
17
Isomorphic Domains
We can say that value domains A and B are isomorphic if
The cardinality of A is equal to the cardinality of B (they have the same number of values)
Both A and B are associated with a conceptual domain C with the same cardinality as A and B
There is an enumeration of value meanings in A that is a 1-1 mapping to concepts in C
There is an enumeration of value meanings in B that is a 1-1 mapping to concepts in C
In other words if two value domains have a one-to-one mapping to the same conceptual domain values they are isomorphic
Isomorphic domains can be harmonized
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
18
Domain Congruence
Two value domains are congruent if their value sets intersect and the size of the intersection meets or exceeds a predefined threshold
Example
FIPS 2-Character State Codes contain values for all US States
USPS 2-Character State Codes contain values for all US States as well as AA AE and AP postal codes for military delivery
Under certain circumstances the two domains can be harmonized
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
19
Encouraging a Culture of Semantic Harmony
Small variance in definitions in isolated functions become magnified when data is shared across functions
Establish a level playing ground by
Instituting a common business term glossary
Harmonizing business term definitions
Unifying shared reference data into conceptual domains and corresponding value domains
Socializing use of shared metadata
Establishing standards for future development
Integrate methods for monitoring compliance with standards
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
20
Check Out These Resources
wwwknowledge-integritycom
wwwdataqualitybookcom
If you have questions comments or suggestions please contact me
David Loshin
301-754-6350
loshinknowledge-integritycom
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301) 754-6350
21
Common Business Terms Are They Really Common
copy 2013 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
2
What is a ldquostaterdquo
Challenges in Semantic Consistency
Data definitions are often biased around specific business function requirements
The meaning of a concept may slightly differ from application to application
Consolidation without considering semantics will lead to confusion downstream
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
3
ldquoLocationrdquo
Municipal
Taxation
Mailing
Delivery
Utility
Evolution of the Business Metadata Glossary
Many sources of entity concepts and business terms may conflict with each other
The data steward must facilitate the collection and documentation of business terms
The data steward must also prepare for harmonization of terms
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
4
Policies
System Docs
Processes
Models
Standards
Applications
Business Rules
Profiling
Etc
Entity Concepts
BusinessTerms
DefinitionContextual
Meaning
hellip hellip
DefinitionContextual
Meaning
DefinitionContextual
Meaning
DefinitionContextual
Meaning
Example ndash Identifying Business Terms
Order ConfirmationIf you do not receive a confirmation number (in the form of a confirmation page or email) after submitting payment information or if you experience an error message or service interruption after submitting payment information it is your responsibility to confirm with FizzDizzleCustomer Service whether or not your order has been placed
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
5
bull Youbull Confirmation numberbull Confirmation pagebull Confirmation emailbull Payment informationbull Error messagebull Service interruptionbull FizzDizzle Customer Servicebull Order
Nouns
Example ndash Identifying Business Terms
Order ConfirmationIf you do not receive a confirmation number (in the form of a confirmation page or email) after submitting payment information or if you experience an error message or service interruption after submitting payment information it is your responsibility to confirm with FizzDizzleCustomer Service whether or not your order has been placed
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
6
bull Receivebull Submittingbull Experiencebull Confirmbull Placed
Verbs
Entities amp Characteristics
Entities are core concepts that are mapped to conceptual data domain models such as
Customer Organization Order Product
The conceptual Data Domain is mapped to a container such as
File table object
Characteristics are attributes of entities modeled as data element concepts
Data element concepts are mapped to data elements
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
7
Characteristics
A data element concept has values taken from a conceptual domain
One data element concept might be mapped to more than one instantiation as a data element
A data element has values taken from a value domain
A conceptual domain might be mapped to more than one instantiation as a value domain
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
8
More on Entities amp Characteristics
An entity may have many characteristics
A data domain may refer to many data element concepts
An instantiated container may have many data elements
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
9
Enumeration of Data Elements
Each data element concept is manifested as one or more data elements in a specific system
The template is used to map data element concepts to used data elements
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
10
Data Element Identifier
NameData
Element Concept
UsageConceptual
DomainValue
DomainStorage
Data TypePresentation
Data TypeUnit of
MeasureBusiness
Rules
20988Customer State State Salesforcecom CST VST-1 Char(2) Char(2) NA
May not be null
Business Terms and Data Element Concepts
Map use of a business term to a definition then to the entity or characteristic
Customer is used in reference to the customer entity
Account Number is used in reference to an attribute of a customer entity
Need to track list of data element concepts
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
11
Concept ID Concept Business Term Definition ID
16-A334License or Permit Holder Licensee BT-977
16-A334License or Permit Holder Permit Holder BT-983
Data Harmonization
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
12
Identify inconsistent definitions conflicts
in data domains format variations
Extract data definitions from current guidance documents etc and categorize definitions
using standard terminology
Integrate data elements into a single reference
source then combine common
data elements
Identify authoritative source
for definitionsAssign names using Naming Convention
Document in metadata Registry
Identify AnomaliesResolve and Standardize
Integrate amp CollateExtract amp Collect
Regu
latio
ns
Po
licie
s
Documents
Forms
Extract amp Collect
Collected metadata includes
Assigned identifier
Data element name
Related business terms
Definition
Data type
Length
Business rules
IssuesComments
Reference domain
Standard name
Authoritative sources
Lineage
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
13
Data Element
Type
FirstName VARCHAR(35)
LastName VARCHAR(40)
SSN CHAR(11)
Telephone VARCHAR(20)
Data Element
Type
First VARCHAR(25)
Middle VARCHAR(25)
Last VARCHAR(30)
SocialSec CHAR(9)
Understanding Reference Data - Assessment
copy 2013 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
14
Jean Montard 0 062672
Michael Evans 0 112168
Fran Peterson 1 030276
Pat Lawson 1 041779
J Montard M 062672
M Evans M 112168
F Peterson F 030276
P Lawson F 041779
bull Each of these data sets have matching records for unique individuals
bull Each has a code value with a 111 correspondence
bull Understand why the values differ in each data set
Jean J Montard F 062672
Michael D Evans F 112168
Fran S Peterson M 030276
Pat O Lawson M 041779
15
Integrate amp Collate
CORP-
QWODR-25
Quarterly Wage
Employee Wage
Amount
11 AN This field will contain the
information as provided from
the Quarterly Wage record
submitted for State Filing
PROPOSED State
Corporate
Quarterly Wage
OUTPUT DETAIL
RECORD
EMPR-385 WAGE AMOUNT The amount of a
personrsquos wages
during a
Reporting
Quarter
11 Signed
Numeric
00000000000 through
99999999999 The last two
positions are implied to be to
the right of the decimal point
Conditional for the following
output record Federal
Employee Locate Response
Record
Federal Match
System
ID Data Element
Name
Definition L T Data Element Rules IssuesCom
ments
Mapping Authoritative
Source
CORP-
QWODR-25
Quarterly Wage
Employee Wage
Amount
11 AN This field will contain the
information as provided from
the Quarterly Wage record
submitted for State Filing
PROPOSED State
Corporate
Quarterly Wage
OUTPUT DETAIL
RECORD
EMPR-385 WAGE AMOUNT The amount of a
personrsquos wages
during a
Reporting
Quarter
11 Signed
Numeric
00000000000 through
99999999999 The last two
positions are implied to be to
the right of the decimal point
Conditional for the following
output record Federal
Employee Locate Response
Record
Federal Match
System
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
Identify Anomalies
Inconsistency or ambiguity for similarly-named data elements
Inconsistency of explicit data element business rules
Incomplete or inconsistent reference value domains
Inconsistent formats
Conflicting data types
Abbreviations vs full names
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
16
ID Data Element
Name
Definition L T Data Element Rules Mapping
SWA-UI-
OD-15
Claimant State Lacking definition 2 AN If present this field will contain the Claimant
State code as provided on the submitted UI
record
State UI Output Detail
Record
IRS-DME-
15
Sex Lacking definition 1 AN Sex Code from the Person Table will contain
spaces if not present on the Person Table
Values not specified
IRS DATA MATCH
EXTRACT RECORD
FCR-69 Benefit Amount The monetary amount of Unemployment
Insurance benefits a person received during
a Reporting Period This definition
does not specify the Reporting
Period SWA specify quarter
[see below]
11 AN 00000000000 through 99999999999 The last
two positions are implied to be to
the right of the decimal point This field
will contain all zeroes when there is no Benefit
Amount or the information is not available
Conditional for the following output record bull
Federal Match Record
SWA-OD-
19
Benefit Amount This field will contain the gross amount of UI
benefits prior to any deductions paid to a
claimant during the reporting quarter as
provided on the UI record submitted to the
NDNH
11 AN Values are 00000000000 through 99999999999
without decimal
This field is whole dollars only
Potential conflict with FCR-69
SWA UI Output Detail
Record
Resolve amp Standardize
Identify authoritative
sources
Prioritize potential
harmonized definitions
Review with subject matter experts
Consolidate if possible
Differentiate if necessary
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
17
Isomorphic Domains
We can say that value domains A and B are isomorphic if
The cardinality of A is equal to the cardinality of B (they have the same number of values)
Both A and B are associated with a conceptual domain C with the same cardinality as A and B
There is an enumeration of value meanings in A that is a 1-1 mapping to concepts in C
There is an enumeration of value meanings in B that is a 1-1 mapping to concepts in C
In other words if two value domains have a one-to-one mapping to the same conceptual domain values they are isomorphic
Isomorphic domains can be harmonized
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
18
Domain Congruence
Two value domains are congruent if their value sets intersect and the size of the intersection meets or exceeds a predefined threshold
Example
FIPS 2-Character State Codes contain values for all US States
USPS 2-Character State Codes contain values for all US States as well as AA AE and AP postal codes for military delivery
Under certain circumstances the two domains can be harmonized
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
19
Encouraging a Culture of Semantic Harmony
Small variance in definitions in isolated functions become magnified when data is shared across functions
Establish a level playing ground by
Instituting a common business term glossary
Harmonizing business term definitions
Unifying shared reference data into conceptual domains and corresponding value domains
Socializing use of shared metadata
Establishing standards for future development
Integrate methods for monitoring compliance with standards
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
20
Check Out These Resources
wwwknowledge-integritycom
wwwdataqualitybookcom
If you have questions comments or suggestions please contact me
David Loshin
301-754-6350
loshinknowledge-integritycom
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301) 754-6350
21
Challenges in Semantic Consistency
Data definitions are often biased around specific business function requirements
The meaning of a concept may slightly differ from application to application
Consolidation without considering semantics will lead to confusion downstream
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
3
ldquoLocationrdquo
Municipal
Taxation
Mailing
Delivery
Utility
Evolution of the Business Metadata Glossary
Many sources of entity concepts and business terms may conflict with each other
The data steward must facilitate the collection and documentation of business terms
The data steward must also prepare for harmonization of terms
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
4
Policies
System Docs
Processes
Models
Standards
Applications
Business Rules
Profiling
Etc
Entity Concepts
BusinessTerms
DefinitionContextual
Meaning
hellip hellip
DefinitionContextual
Meaning
DefinitionContextual
Meaning
DefinitionContextual
Meaning
Example ndash Identifying Business Terms
Order ConfirmationIf you do not receive a confirmation number (in the form of a confirmation page or email) after submitting payment information or if you experience an error message or service interruption after submitting payment information it is your responsibility to confirm with FizzDizzleCustomer Service whether or not your order has been placed
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
5
bull Youbull Confirmation numberbull Confirmation pagebull Confirmation emailbull Payment informationbull Error messagebull Service interruptionbull FizzDizzle Customer Servicebull Order
Nouns
Example ndash Identifying Business Terms
Order ConfirmationIf you do not receive a confirmation number (in the form of a confirmation page or email) after submitting payment information or if you experience an error message or service interruption after submitting payment information it is your responsibility to confirm with FizzDizzleCustomer Service whether or not your order has been placed
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
6
bull Receivebull Submittingbull Experiencebull Confirmbull Placed
Verbs
Entities amp Characteristics
Entities are core concepts that are mapped to conceptual data domain models such as
Customer Organization Order Product
The conceptual Data Domain is mapped to a container such as
File table object
Characteristics are attributes of entities modeled as data element concepts
Data element concepts are mapped to data elements
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
7
Characteristics
A data element concept has values taken from a conceptual domain
One data element concept might be mapped to more than one instantiation as a data element
A data element has values taken from a value domain
A conceptual domain might be mapped to more than one instantiation as a value domain
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
8
More on Entities amp Characteristics
An entity may have many characteristics
A data domain may refer to many data element concepts
An instantiated container may have many data elements
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
9
Enumeration of Data Elements
Each data element concept is manifested as one or more data elements in a specific system
The template is used to map data element concepts to used data elements
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
10
Data Element Identifier
NameData
Element Concept
UsageConceptual
DomainValue
DomainStorage
Data TypePresentation
Data TypeUnit of
MeasureBusiness
Rules
20988Customer State State Salesforcecom CST VST-1 Char(2) Char(2) NA
May not be null
Business Terms and Data Element Concepts
Map use of a business term to a definition then to the entity or characteristic
Customer is used in reference to the customer entity
Account Number is used in reference to an attribute of a customer entity
Need to track list of data element concepts
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
11
Concept ID Concept Business Term Definition ID
16-A334License or Permit Holder Licensee BT-977
16-A334License or Permit Holder Permit Holder BT-983
Data Harmonization
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
12
Identify inconsistent definitions conflicts
in data domains format variations
Extract data definitions from current guidance documents etc and categorize definitions
using standard terminology
Integrate data elements into a single reference
source then combine common
data elements
Identify authoritative source
for definitionsAssign names using Naming Convention
Document in metadata Registry
Identify AnomaliesResolve and Standardize
Integrate amp CollateExtract amp Collect
Regu
latio
ns
Po
licie
s
Documents
Forms
Extract amp Collect
Collected metadata includes
Assigned identifier
Data element name
Related business terms
Definition
Data type
Length
Business rules
IssuesComments
Reference domain
Standard name
Authoritative sources
Lineage
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
13
Data Element
Type
FirstName VARCHAR(35)
LastName VARCHAR(40)
SSN CHAR(11)
Telephone VARCHAR(20)
Data Element
Type
First VARCHAR(25)
Middle VARCHAR(25)
Last VARCHAR(30)
SocialSec CHAR(9)
Understanding Reference Data - Assessment
copy 2013 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
14
Jean Montard 0 062672
Michael Evans 0 112168
Fran Peterson 1 030276
Pat Lawson 1 041779
J Montard M 062672
M Evans M 112168
F Peterson F 030276
P Lawson F 041779
bull Each of these data sets have matching records for unique individuals
bull Each has a code value with a 111 correspondence
bull Understand why the values differ in each data set
Jean J Montard F 062672
Michael D Evans F 112168
Fran S Peterson M 030276
Pat O Lawson M 041779
15
Integrate amp Collate
CORP-
QWODR-25
Quarterly Wage
Employee Wage
Amount
11 AN This field will contain the
information as provided from
the Quarterly Wage record
submitted for State Filing
PROPOSED State
Corporate
Quarterly Wage
OUTPUT DETAIL
RECORD
EMPR-385 WAGE AMOUNT The amount of a
personrsquos wages
during a
Reporting
Quarter
11 Signed
Numeric
00000000000 through
99999999999 The last two
positions are implied to be to
the right of the decimal point
Conditional for the following
output record Federal
Employee Locate Response
Record
Federal Match
System
ID Data Element
Name
Definition L T Data Element Rules IssuesCom
ments
Mapping Authoritative
Source
CORP-
QWODR-25
Quarterly Wage
Employee Wage
Amount
11 AN This field will contain the
information as provided from
the Quarterly Wage record
submitted for State Filing
PROPOSED State
Corporate
Quarterly Wage
OUTPUT DETAIL
RECORD
EMPR-385 WAGE AMOUNT The amount of a
personrsquos wages
during a
Reporting
Quarter
11 Signed
Numeric
00000000000 through
99999999999 The last two
positions are implied to be to
the right of the decimal point
Conditional for the following
output record Federal
Employee Locate Response
Record
Federal Match
System
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
Identify Anomalies
Inconsistency or ambiguity for similarly-named data elements
Inconsistency of explicit data element business rules
Incomplete or inconsistent reference value domains
Inconsistent formats
Conflicting data types
Abbreviations vs full names
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
16
ID Data Element
Name
Definition L T Data Element Rules Mapping
SWA-UI-
OD-15
Claimant State Lacking definition 2 AN If present this field will contain the Claimant
State code as provided on the submitted UI
record
State UI Output Detail
Record
IRS-DME-
15
Sex Lacking definition 1 AN Sex Code from the Person Table will contain
spaces if not present on the Person Table
Values not specified
IRS DATA MATCH
EXTRACT RECORD
FCR-69 Benefit Amount The monetary amount of Unemployment
Insurance benefits a person received during
a Reporting Period This definition
does not specify the Reporting
Period SWA specify quarter
[see below]
11 AN 00000000000 through 99999999999 The last
two positions are implied to be to
the right of the decimal point This field
will contain all zeroes when there is no Benefit
Amount or the information is not available
Conditional for the following output record bull
Federal Match Record
SWA-OD-
19
Benefit Amount This field will contain the gross amount of UI
benefits prior to any deductions paid to a
claimant during the reporting quarter as
provided on the UI record submitted to the
NDNH
11 AN Values are 00000000000 through 99999999999
without decimal
This field is whole dollars only
Potential conflict with FCR-69
SWA UI Output Detail
Record
Resolve amp Standardize
Identify authoritative
sources
Prioritize potential
harmonized definitions
Review with subject matter experts
Consolidate if possible
Differentiate if necessary
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
17
Isomorphic Domains
We can say that value domains A and B are isomorphic if
The cardinality of A is equal to the cardinality of B (they have the same number of values)
Both A and B are associated with a conceptual domain C with the same cardinality as A and B
There is an enumeration of value meanings in A that is a 1-1 mapping to concepts in C
There is an enumeration of value meanings in B that is a 1-1 mapping to concepts in C
In other words if two value domains have a one-to-one mapping to the same conceptual domain values they are isomorphic
Isomorphic domains can be harmonized
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
18
Domain Congruence
Two value domains are congruent if their value sets intersect and the size of the intersection meets or exceeds a predefined threshold
Example
FIPS 2-Character State Codes contain values for all US States
USPS 2-Character State Codes contain values for all US States as well as AA AE and AP postal codes for military delivery
Under certain circumstances the two domains can be harmonized
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
19
Encouraging a Culture of Semantic Harmony
Small variance in definitions in isolated functions become magnified when data is shared across functions
Establish a level playing ground by
Instituting a common business term glossary
Harmonizing business term definitions
Unifying shared reference data into conceptual domains and corresponding value domains
Socializing use of shared metadata
Establishing standards for future development
Integrate methods for monitoring compliance with standards
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
20
Check Out These Resources
wwwknowledge-integritycom
wwwdataqualitybookcom
If you have questions comments or suggestions please contact me
David Loshin
301-754-6350
loshinknowledge-integritycom
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301) 754-6350
21
Evolution of the Business Metadata Glossary
Many sources of entity concepts and business terms may conflict with each other
The data steward must facilitate the collection and documentation of business terms
The data steward must also prepare for harmonization of terms
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
4
Policies
System Docs
Processes
Models
Standards
Applications
Business Rules
Profiling
Etc
Entity Concepts
BusinessTerms
DefinitionContextual
Meaning
hellip hellip
DefinitionContextual
Meaning
DefinitionContextual
Meaning
DefinitionContextual
Meaning
Example ndash Identifying Business Terms
Order ConfirmationIf you do not receive a confirmation number (in the form of a confirmation page or email) after submitting payment information or if you experience an error message or service interruption after submitting payment information it is your responsibility to confirm with FizzDizzleCustomer Service whether or not your order has been placed
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
5
bull Youbull Confirmation numberbull Confirmation pagebull Confirmation emailbull Payment informationbull Error messagebull Service interruptionbull FizzDizzle Customer Servicebull Order
Nouns
Example ndash Identifying Business Terms
Order ConfirmationIf you do not receive a confirmation number (in the form of a confirmation page or email) after submitting payment information or if you experience an error message or service interruption after submitting payment information it is your responsibility to confirm with FizzDizzleCustomer Service whether or not your order has been placed
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
6
bull Receivebull Submittingbull Experiencebull Confirmbull Placed
Verbs
Entities amp Characteristics
Entities are core concepts that are mapped to conceptual data domain models such as
Customer Organization Order Product
The conceptual Data Domain is mapped to a container such as
File table object
Characteristics are attributes of entities modeled as data element concepts
Data element concepts are mapped to data elements
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
7
Characteristics
A data element concept has values taken from a conceptual domain
One data element concept might be mapped to more than one instantiation as a data element
A data element has values taken from a value domain
A conceptual domain might be mapped to more than one instantiation as a value domain
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
8
More on Entities amp Characteristics
An entity may have many characteristics
A data domain may refer to many data element concepts
An instantiated container may have many data elements
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
9
Enumeration of Data Elements
Each data element concept is manifested as one or more data elements in a specific system
The template is used to map data element concepts to used data elements
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
10
Data Element Identifier
NameData
Element Concept
UsageConceptual
DomainValue
DomainStorage
Data TypePresentation
Data TypeUnit of
MeasureBusiness
Rules
20988Customer State State Salesforcecom CST VST-1 Char(2) Char(2) NA
May not be null
Business Terms and Data Element Concepts
Map use of a business term to a definition then to the entity or characteristic
Customer is used in reference to the customer entity
Account Number is used in reference to an attribute of a customer entity
Need to track list of data element concepts
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
11
Concept ID Concept Business Term Definition ID
16-A334License or Permit Holder Licensee BT-977
16-A334License or Permit Holder Permit Holder BT-983
Data Harmonization
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
12
Identify inconsistent definitions conflicts
in data domains format variations
Extract data definitions from current guidance documents etc and categorize definitions
using standard terminology
Integrate data elements into a single reference
source then combine common
data elements
Identify authoritative source
for definitionsAssign names using Naming Convention
Document in metadata Registry
Identify AnomaliesResolve and Standardize
Integrate amp CollateExtract amp Collect
Regu
latio
ns
Po
licie
s
Documents
Forms
Extract amp Collect
Collected metadata includes
Assigned identifier
Data element name
Related business terms
Definition
Data type
Length
Business rules
IssuesComments
Reference domain
Standard name
Authoritative sources
Lineage
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
13
Data Element
Type
FirstName VARCHAR(35)
LastName VARCHAR(40)
SSN CHAR(11)
Telephone VARCHAR(20)
Data Element
Type
First VARCHAR(25)
Middle VARCHAR(25)
Last VARCHAR(30)
SocialSec CHAR(9)
Understanding Reference Data - Assessment
copy 2013 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
14
Jean Montard 0 062672
Michael Evans 0 112168
Fran Peterson 1 030276
Pat Lawson 1 041779
J Montard M 062672
M Evans M 112168
F Peterson F 030276
P Lawson F 041779
bull Each of these data sets have matching records for unique individuals
bull Each has a code value with a 111 correspondence
bull Understand why the values differ in each data set
Jean J Montard F 062672
Michael D Evans F 112168
Fran S Peterson M 030276
Pat O Lawson M 041779
15
Integrate amp Collate
CORP-
QWODR-25
Quarterly Wage
Employee Wage
Amount
11 AN This field will contain the
information as provided from
the Quarterly Wage record
submitted for State Filing
PROPOSED State
Corporate
Quarterly Wage
OUTPUT DETAIL
RECORD
EMPR-385 WAGE AMOUNT The amount of a
personrsquos wages
during a
Reporting
Quarter
11 Signed
Numeric
00000000000 through
99999999999 The last two
positions are implied to be to
the right of the decimal point
Conditional for the following
output record Federal
Employee Locate Response
Record
Federal Match
System
ID Data Element
Name
Definition L T Data Element Rules IssuesCom
ments
Mapping Authoritative
Source
CORP-
QWODR-25
Quarterly Wage
Employee Wage
Amount
11 AN This field will contain the
information as provided from
the Quarterly Wage record
submitted for State Filing
PROPOSED State
Corporate
Quarterly Wage
OUTPUT DETAIL
RECORD
EMPR-385 WAGE AMOUNT The amount of a
personrsquos wages
during a
Reporting
Quarter
11 Signed
Numeric
00000000000 through
99999999999 The last two
positions are implied to be to
the right of the decimal point
Conditional for the following
output record Federal
Employee Locate Response
Record
Federal Match
System
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
Identify Anomalies
Inconsistency or ambiguity for similarly-named data elements
Inconsistency of explicit data element business rules
Incomplete or inconsistent reference value domains
Inconsistent formats
Conflicting data types
Abbreviations vs full names
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
16
ID Data Element
Name
Definition L T Data Element Rules Mapping
SWA-UI-
OD-15
Claimant State Lacking definition 2 AN If present this field will contain the Claimant
State code as provided on the submitted UI
record
State UI Output Detail
Record
IRS-DME-
15
Sex Lacking definition 1 AN Sex Code from the Person Table will contain
spaces if not present on the Person Table
Values not specified
IRS DATA MATCH
EXTRACT RECORD
FCR-69 Benefit Amount The monetary amount of Unemployment
Insurance benefits a person received during
a Reporting Period This definition
does not specify the Reporting
Period SWA specify quarter
[see below]
11 AN 00000000000 through 99999999999 The last
two positions are implied to be to
the right of the decimal point This field
will contain all zeroes when there is no Benefit
Amount or the information is not available
Conditional for the following output record bull
Federal Match Record
SWA-OD-
19
Benefit Amount This field will contain the gross amount of UI
benefits prior to any deductions paid to a
claimant during the reporting quarter as
provided on the UI record submitted to the
NDNH
11 AN Values are 00000000000 through 99999999999
without decimal
This field is whole dollars only
Potential conflict with FCR-69
SWA UI Output Detail
Record
Resolve amp Standardize
Identify authoritative
sources
Prioritize potential
harmonized definitions
Review with subject matter experts
Consolidate if possible
Differentiate if necessary
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
17
Isomorphic Domains
We can say that value domains A and B are isomorphic if
The cardinality of A is equal to the cardinality of B (they have the same number of values)
Both A and B are associated with a conceptual domain C with the same cardinality as A and B
There is an enumeration of value meanings in A that is a 1-1 mapping to concepts in C
There is an enumeration of value meanings in B that is a 1-1 mapping to concepts in C
In other words if two value domains have a one-to-one mapping to the same conceptual domain values they are isomorphic
Isomorphic domains can be harmonized
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
18
Domain Congruence
Two value domains are congruent if their value sets intersect and the size of the intersection meets or exceeds a predefined threshold
Example
FIPS 2-Character State Codes contain values for all US States
USPS 2-Character State Codes contain values for all US States as well as AA AE and AP postal codes for military delivery
Under certain circumstances the two domains can be harmonized
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
19
Encouraging a Culture of Semantic Harmony
Small variance in definitions in isolated functions become magnified when data is shared across functions
Establish a level playing ground by
Instituting a common business term glossary
Harmonizing business term definitions
Unifying shared reference data into conceptual domains and corresponding value domains
Socializing use of shared metadata
Establishing standards for future development
Integrate methods for monitoring compliance with standards
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
20
Check Out These Resources
wwwknowledge-integritycom
wwwdataqualitybookcom
If you have questions comments or suggestions please contact me
David Loshin
301-754-6350
loshinknowledge-integritycom
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301) 754-6350
21
Example ndash Identifying Business Terms
Order ConfirmationIf you do not receive a confirmation number (in the form of a confirmation page or email) after submitting payment information or if you experience an error message or service interruption after submitting payment information it is your responsibility to confirm with FizzDizzleCustomer Service whether or not your order has been placed
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
5
bull Youbull Confirmation numberbull Confirmation pagebull Confirmation emailbull Payment informationbull Error messagebull Service interruptionbull FizzDizzle Customer Servicebull Order
Nouns
Example ndash Identifying Business Terms
Order ConfirmationIf you do not receive a confirmation number (in the form of a confirmation page or email) after submitting payment information or if you experience an error message or service interruption after submitting payment information it is your responsibility to confirm with FizzDizzleCustomer Service whether or not your order has been placed
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
6
bull Receivebull Submittingbull Experiencebull Confirmbull Placed
Verbs
Entities amp Characteristics
Entities are core concepts that are mapped to conceptual data domain models such as
Customer Organization Order Product
The conceptual Data Domain is mapped to a container such as
File table object
Characteristics are attributes of entities modeled as data element concepts
Data element concepts are mapped to data elements
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
7
Characteristics
A data element concept has values taken from a conceptual domain
One data element concept might be mapped to more than one instantiation as a data element
A data element has values taken from a value domain
A conceptual domain might be mapped to more than one instantiation as a value domain
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
8
More on Entities amp Characteristics
An entity may have many characteristics
A data domain may refer to many data element concepts
An instantiated container may have many data elements
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
9
Enumeration of Data Elements
Each data element concept is manifested as one or more data elements in a specific system
The template is used to map data element concepts to used data elements
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
10
Data Element Identifier
NameData
Element Concept
UsageConceptual
DomainValue
DomainStorage
Data TypePresentation
Data TypeUnit of
MeasureBusiness
Rules
20988Customer State State Salesforcecom CST VST-1 Char(2) Char(2) NA
May not be null
Business Terms and Data Element Concepts
Map use of a business term to a definition then to the entity or characteristic
Customer is used in reference to the customer entity
Account Number is used in reference to an attribute of a customer entity
Need to track list of data element concepts
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
11
Concept ID Concept Business Term Definition ID
16-A334License or Permit Holder Licensee BT-977
16-A334License or Permit Holder Permit Holder BT-983
Data Harmonization
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
12
Identify inconsistent definitions conflicts
in data domains format variations
Extract data definitions from current guidance documents etc and categorize definitions
using standard terminology
Integrate data elements into a single reference
source then combine common
data elements
Identify authoritative source
for definitionsAssign names using Naming Convention
Document in metadata Registry
Identify AnomaliesResolve and Standardize
Integrate amp CollateExtract amp Collect
Regu
latio
ns
Po
licie
s
Documents
Forms
Extract amp Collect
Collected metadata includes
Assigned identifier
Data element name
Related business terms
Definition
Data type
Length
Business rules
IssuesComments
Reference domain
Standard name
Authoritative sources
Lineage
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
13
Data Element
Type
FirstName VARCHAR(35)
LastName VARCHAR(40)
SSN CHAR(11)
Telephone VARCHAR(20)
Data Element
Type
First VARCHAR(25)
Middle VARCHAR(25)
Last VARCHAR(30)
SocialSec CHAR(9)
Understanding Reference Data - Assessment
copy 2013 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
14
Jean Montard 0 062672
Michael Evans 0 112168
Fran Peterson 1 030276
Pat Lawson 1 041779
J Montard M 062672
M Evans M 112168
F Peterson F 030276
P Lawson F 041779
bull Each of these data sets have matching records for unique individuals
bull Each has a code value with a 111 correspondence
bull Understand why the values differ in each data set
Jean J Montard F 062672
Michael D Evans F 112168
Fran S Peterson M 030276
Pat O Lawson M 041779
15
Integrate amp Collate
CORP-
QWODR-25
Quarterly Wage
Employee Wage
Amount
11 AN This field will contain the
information as provided from
the Quarterly Wage record
submitted for State Filing
PROPOSED State
Corporate
Quarterly Wage
OUTPUT DETAIL
RECORD
EMPR-385 WAGE AMOUNT The amount of a
personrsquos wages
during a
Reporting
Quarter
11 Signed
Numeric
00000000000 through
99999999999 The last two
positions are implied to be to
the right of the decimal point
Conditional for the following
output record Federal
Employee Locate Response
Record
Federal Match
System
ID Data Element
Name
Definition L T Data Element Rules IssuesCom
ments
Mapping Authoritative
Source
CORP-
QWODR-25
Quarterly Wage
Employee Wage
Amount
11 AN This field will contain the
information as provided from
the Quarterly Wage record
submitted for State Filing
PROPOSED State
Corporate
Quarterly Wage
OUTPUT DETAIL
RECORD
EMPR-385 WAGE AMOUNT The amount of a
personrsquos wages
during a
Reporting
Quarter
11 Signed
Numeric
00000000000 through
99999999999 The last two
positions are implied to be to
the right of the decimal point
Conditional for the following
output record Federal
Employee Locate Response
Record
Federal Match
System
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
Identify Anomalies
Inconsistency or ambiguity for similarly-named data elements
Inconsistency of explicit data element business rules
Incomplete or inconsistent reference value domains
Inconsistent formats
Conflicting data types
Abbreviations vs full names
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
16
ID Data Element
Name
Definition L T Data Element Rules Mapping
SWA-UI-
OD-15
Claimant State Lacking definition 2 AN If present this field will contain the Claimant
State code as provided on the submitted UI
record
State UI Output Detail
Record
IRS-DME-
15
Sex Lacking definition 1 AN Sex Code from the Person Table will contain
spaces if not present on the Person Table
Values not specified
IRS DATA MATCH
EXTRACT RECORD
FCR-69 Benefit Amount The monetary amount of Unemployment
Insurance benefits a person received during
a Reporting Period This definition
does not specify the Reporting
Period SWA specify quarter
[see below]
11 AN 00000000000 through 99999999999 The last
two positions are implied to be to
the right of the decimal point This field
will contain all zeroes when there is no Benefit
Amount or the information is not available
Conditional for the following output record bull
Federal Match Record
SWA-OD-
19
Benefit Amount This field will contain the gross amount of UI
benefits prior to any deductions paid to a
claimant during the reporting quarter as
provided on the UI record submitted to the
NDNH
11 AN Values are 00000000000 through 99999999999
without decimal
This field is whole dollars only
Potential conflict with FCR-69
SWA UI Output Detail
Record
Resolve amp Standardize
Identify authoritative
sources
Prioritize potential
harmonized definitions
Review with subject matter experts
Consolidate if possible
Differentiate if necessary
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
17
Isomorphic Domains
We can say that value domains A and B are isomorphic if
The cardinality of A is equal to the cardinality of B (they have the same number of values)
Both A and B are associated with a conceptual domain C with the same cardinality as A and B
There is an enumeration of value meanings in A that is a 1-1 mapping to concepts in C
There is an enumeration of value meanings in B that is a 1-1 mapping to concepts in C
In other words if two value domains have a one-to-one mapping to the same conceptual domain values they are isomorphic
Isomorphic domains can be harmonized
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
18
Domain Congruence
Two value domains are congruent if their value sets intersect and the size of the intersection meets or exceeds a predefined threshold
Example
FIPS 2-Character State Codes contain values for all US States
USPS 2-Character State Codes contain values for all US States as well as AA AE and AP postal codes for military delivery
Under certain circumstances the two domains can be harmonized
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
19
Encouraging a Culture of Semantic Harmony
Small variance in definitions in isolated functions become magnified when data is shared across functions
Establish a level playing ground by
Instituting a common business term glossary
Harmonizing business term definitions
Unifying shared reference data into conceptual domains and corresponding value domains
Socializing use of shared metadata
Establishing standards for future development
Integrate methods for monitoring compliance with standards
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
20
Check Out These Resources
wwwknowledge-integritycom
wwwdataqualitybookcom
If you have questions comments or suggestions please contact me
David Loshin
301-754-6350
loshinknowledge-integritycom
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301) 754-6350
21
Example ndash Identifying Business Terms
Order ConfirmationIf you do not receive a confirmation number (in the form of a confirmation page or email) after submitting payment information or if you experience an error message or service interruption after submitting payment information it is your responsibility to confirm with FizzDizzleCustomer Service whether or not your order has been placed
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
6
bull Receivebull Submittingbull Experiencebull Confirmbull Placed
Verbs
Entities amp Characteristics
Entities are core concepts that are mapped to conceptual data domain models such as
Customer Organization Order Product
The conceptual Data Domain is mapped to a container such as
File table object
Characteristics are attributes of entities modeled as data element concepts
Data element concepts are mapped to data elements
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
7
Characteristics
A data element concept has values taken from a conceptual domain
One data element concept might be mapped to more than one instantiation as a data element
A data element has values taken from a value domain
A conceptual domain might be mapped to more than one instantiation as a value domain
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
8
More on Entities amp Characteristics
An entity may have many characteristics
A data domain may refer to many data element concepts
An instantiated container may have many data elements
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
9
Enumeration of Data Elements
Each data element concept is manifested as one or more data elements in a specific system
The template is used to map data element concepts to used data elements
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
10
Data Element Identifier
NameData
Element Concept
UsageConceptual
DomainValue
DomainStorage
Data TypePresentation
Data TypeUnit of
MeasureBusiness
Rules
20988Customer State State Salesforcecom CST VST-1 Char(2) Char(2) NA
May not be null
Business Terms and Data Element Concepts
Map use of a business term to a definition then to the entity or characteristic
Customer is used in reference to the customer entity
Account Number is used in reference to an attribute of a customer entity
Need to track list of data element concepts
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
11
Concept ID Concept Business Term Definition ID
16-A334License or Permit Holder Licensee BT-977
16-A334License or Permit Holder Permit Holder BT-983
Data Harmonization
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
12
Identify inconsistent definitions conflicts
in data domains format variations
Extract data definitions from current guidance documents etc and categorize definitions
using standard terminology
Integrate data elements into a single reference
source then combine common
data elements
Identify authoritative source
for definitionsAssign names using Naming Convention
Document in metadata Registry
Identify AnomaliesResolve and Standardize
Integrate amp CollateExtract amp Collect
Regu
latio
ns
Po
licie
s
Documents
Forms
Extract amp Collect
Collected metadata includes
Assigned identifier
Data element name
Related business terms
Definition
Data type
Length
Business rules
IssuesComments
Reference domain
Standard name
Authoritative sources
Lineage
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
13
Data Element
Type
FirstName VARCHAR(35)
LastName VARCHAR(40)
SSN CHAR(11)
Telephone VARCHAR(20)
Data Element
Type
First VARCHAR(25)
Middle VARCHAR(25)
Last VARCHAR(30)
SocialSec CHAR(9)
Understanding Reference Data - Assessment
copy 2013 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
14
Jean Montard 0 062672
Michael Evans 0 112168
Fran Peterson 1 030276
Pat Lawson 1 041779
J Montard M 062672
M Evans M 112168
F Peterson F 030276
P Lawson F 041779
bull Each of these data sets have matching records for unique individuals
bull Each has a code value with a 111 correspondence
bull Understand why the values differ in each data set
Jean J Montard F 062672
Michael D Evans F 112168
Fran S Peterson M 030276
Pat O Lawson M 041779
15
Integrate amp Collate
CORP-
QWODR-25
Quarterly Wage
Employee Wage
Amount
11 AN This field will contain the
information as provided from
the Quarterly Wage record
submitted for State Filing
PROPOSED State
Corporate
Quarterly Wage
OUTPUT DETAIL
RECORD
EMPR-385 WAGE AMOUNT The amount of a
personrsquos wages
during a
Reporting
Quarter
11 Signed
Numeric
00000000000 through
99999999999 The last two
positions are implied to be to
the right of the decimal point
Conditional for the following
output record Federal
Employee Locate Response
Record
Federal Match
System
ID Data Element
Name
Definition L T Data Element Rules IssuesCom
ments
Mapping Authoritative
Source
CORP-
QWODR-25
Quarterly Wage
Employee Wage
Amount
11 AN This field will contain the
information as provided from
the Quarterly Wage record
submitted for State Filing
PROPOSED State
Corporate
Quarterly Wage
OUTPUT DETAIL
RECORD
EMPR-385 WAGE AMOUNT The amount of a
personrsquos wages
during a
Reporting
Quarter
11 Signed
Numeric
00000000000 through
99999999999 The last two
positions are implied to be to
the right of the decimal point
Conditional for the following
output record Federal
Employee Locate Response
Record
Federal Match
System
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
Identify Anomalies
Inconsistency or ambiguity for similarly-named data elements
Inconsistency of explicit data element business rules
Incomplete or inconsistent reference value domains
Inconsistent formats
Conflicting data types
Abbreviations vs full names
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
16
ID Data Element
Name
Definition L T Data Element Rules Mapping
SWA-UI-
OD-15
Claimant State Lacking definition 2 AN If present this field will contain the Claimant
State code as provided on the submitted UI
record
State UI Output Detail
Record
IRS-DME-
15
Sex Lacking definition 1 AN Sex Code from the Person Table will contain
spaces if not present on the Person Table
Values not specified
IRS DATA MATCH
EXTRACT RECORD
FCR-69 Benefit Amount The monetary amount of Unemployment
Insurance benefits a person received during
a Reporting Period This definition
does not specify the Reporting
Period SWA specify quarter
[see below]
11 AN 00000000000 through 99999999999 The last
two positions are implied to be to
the right of the decimal point This field
will contain all zeroes when there is no Benefit
Amount or the information is not available
Conditional for the following output record bull
Federal Match Record
SWA-OD-
19
Benefit Amount This field will contain the gross amount of UI
benefits prior to any deductions paid to a
claimant during the reporting quarter as
provided on the UI record submitted to the
NDNH
11 AN Values are 00000000000 through 99999999999
without decimal
This field is whole dollars only
Potential conflict with FCR-69
SWA UI Output Detail
Record
Resolve amp Standardize
Identify authoritative
sources
Prioritize potential
harmonized definitions
Review with subject matter experts
Consolidate if possible
Differentiate if necessary
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
17
Isomorphic Domains
We can say that value domains A and B are isomorphic if
The cardinality of A is equal to the cardinality of B (they have the same number of values)
Both A and B are associated with a conceptual domain C with the same cardinality as A and B
There is an enumeration of value meanings in A that is a 1-1 mapping to concepts in C
There is an enumeration of value meanings in B that is a 1-1 mapping to concepts in C
In other words if two value domains have a one-to-one mapping to the same conceptual domain values they are isomorphic
Isomorphic domains can be harmonized
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
18
Domain Congruence
Two value domains are congruent if their value sets intersect and the size of the intersection meets or exceeds a predefined threshold
Example
FIPS 2-Character State Codes contain values for all US States
USPS 2-Character State Codes contain values for all US States as well as AA AE and AP postal codes for military delivery
Under certain circumstances the two domains can be harmonized
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
19
Encouraging a Culture of Semantic Harmony
Small variance in definitions in isolated functions become magnified when data is shared across functions
Establish a level playing ground by
Instituting a common business term glossary
Harmonizing business term definitions
Unifying shared reference data into conceptual domains and corresponding value domains
Socializing use of shared metadata
Establishing standards for future development
Integrate methods for monitoring compliance with standards
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
20
Check Out These Resources
wwwknowledge-integritycom
wwwdataqualitybookcom
If you have questions comments or suggestions please contact me
David Loshin
301-754-6350
loshinknowledge-integritycom
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301) 754-6350
21
Entities amp Characteristics
Entities are core concepts that are mapped to conceptual data domain models such as
Customer Organization Order Product
The conceptual Data Domain is mapped to a container such as
File table object
Characteristics are attributes of entities modeled as data element concepts
Data element concepts are mapped to data elements
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
7
Characteristics
A data element concept has values taken from a conceptual domain
One data element concept might be mapped to more than one instantiation as a data element
A data element has values taken from a value domain
A conceptual domain might be mapped to more than one instantiation as a value domain
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
8
More on Entities amp Characteristics
An entity may have many characteristics
A data domain may refer to many data element concepts
An instantiated container may have many data elements
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
9
Enumeration of Data Elements
Each data element concept is manifested as one or more data elements in a specific system
The template is used to map data element concepts to used data elements
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
10
Data Element Identifier
NameData
Element Concept
UsageConceptual
DomainValue
DomainStorage
Data TypePresentation
Data TypeUnit of
MeasureBusiness
Rules
20988Customer State State Salesforcecom CST VST-1 Char(2) Char(2) NA
May not be null
Business Terms and Data Element Concepts
Map use of a business term to a definition then to the entity or characteristic
Customer is used in reference to the customer entity
Account Number is used in reference to an attribute of a customer entity
Need to track list of data element concepts
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
11
Concept ID Concept Business Term Definition ID
16-A334License or Permit Holder Licensee BT-977
16-A334License or Permit Holder Permit Holder BT-983
Data Harmonization
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
12
Identify inconsistent definitions conflicts
in data domains format variations
Extract data definitions from current guidance documents etc and categorize definitions
using standard terminology
Integrate data elements into a single reference
source then combine common
data elements
Identify authoritative source
for definitionsAssign names using Naming Convention
Document in metadata Registry
Identify AnomaliesResolve and Standardize
Integrate amp CollateExtract amp Collect
Regu
latio
ns
Po
licie
s
Documents
Forms
Extract amp Collect
Collected metadata includes
Assigned identifier
Data element name
Related business terms
Definition
Data type
Length
Business rules
IssuesComments
Reference domain
Standard name
Authoritative sources
Lineage
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
13
Data Element
Type
FirstName VARCHAR(35)
LastName VARCHAR(40)
SSN CHAR(11)
Telephone VARCHAR(20)
Data Element
Type
First VARCHAR(25)
Middle VARCHAR(25)
Last VARCHAR(30)
SocialSec CHAR(9)
Understanding Reference Data - Assessment
copy 2013 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
14
Jean Montard 0 062672
Michael Evans 0 112168
Fran Peterson 1 030276
Pat Lawson 1 041779
J Montard M 062672
M Evans M 112168
F Peterson F 030276
P Lawson F 041779
bull Each of these data sets have matching records for unique individuals
bull Each has a code value with a 111 correspondence
bull Understand why the values differ in each data set
Jean J Montard F 062672
Michael D Evans F 112168
Fran S Peterson M 030276
Pat O Lawson M 041779
15
Integrate amp Collate
CORP-
QWODR-25
Quarterly Wage
Employee Wage
Amount
11 AN This field will contain the
information as provided from
the Quarterly Wage record
submitted for State Filing
PROPOSED State
Corporate
Quarterly Wage
OUTPUT DETAIL
RECORD
EMPR-385 WAGE AMOUNT The amount of a
personrsquos wages
during a
Reporting
Quarter
11 Signed
Numeric
00000000000 through
99999999999 The last two
positions are implied to be to
the right of the decimal point
Conditional for the following
output record Federal
Employee Locate Response
Record
Federal Match
System
ID Data Element
Name
Definition L T Data Element Rules IssuesCom
ments
Mapping Authoritative
Source
CORP-
QWODR-25
Quarterly Wage
Employee Wage
Amount
11 AN This field will contain the
information as provided from
the Quarterly Wage record
submitted for State Filing
PROPOSED State
Corporate
Quarterly Wage
OUTPUT DETAIL
RECORD
EMPR-385 WAGE AMOUNT The amount of a
personrsquos wages
during a
Reporting
Quarter
11 Signed
Numeric
00000000000 through
99999999999 The last two
positions are implied to be to
the right of the decimal point
Conditional for the following
output record Federal
Employee Locate Response
Record
Federal Match
System
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
Identify Anomalies
Inconsistency or ambiguity for similarly-named data elements
Inconsistency of explicit data element business rules
Incomplete or inconsistent reference value domains
Inconsistent formats
Conflicting data types
Abbreviations vs full names
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
16
ID Data Element
Name
Definition L T Data Element Rules Mapping
SWA-UI-
OD-15
Claimant State Lacking definition 2 AN If present this field will contain the Claimant
State code as provided on the submitted UI
record
State UI Output Detail
Record
IRS-DME-
15
Sex Lacking definition 1 AN Sex Code from the Person Table will contain
spaces if not present on the Person Table
Values not specified
IRS DATA MATCH
EXTRACT RECORD
FCR-69 Benefit Amount The monetary amount of Unemployment
Insurance benefits a person received during
a Reporting Period This definition
does not specify the Reporting
Period SWA specify quarter
[see below]
11 AN 00000000000 through 99999999999 The last
two positions are implied to be to
the right of the decimal point This field
will contain all zeroes when there is no Benefit
Amount or the information is not available
Conditional for the following output record bull
Federal Match Record
SWA-OD-
19
Benefit Amount This field will contain the gross amount of UI
benefits prior to any deductions paid to a
claimant during the reporting quarter as
provided on the UI record submitted to the
NDNH
11 AN Values are 00000000000 through 99999999999
without decimal
This field is whole dollars only
Potential conflict with FCR-69
SWA UI Output Detail
Record
Resolve amp Standardize
Identify authoritative
sources
Prioritize potential
harmonized definitions
Review with subject matter experts
Consolidate if possible
Differentiate if necessary
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
17
Isomorphic Domains
We can say that value domains A and B are isomorphic if
The cardinality of A is equal to the cardinality of B (they have the same number of values)
Both A and B are associated with a conceptual domain C with the same cardinality as A and B
There is an enumeration of value meanings in A that is a 1-1 mapping to concepts in C
There is an enumeration of value meanings in B that is a 1-1 mapping to concepts in C
In other words if two value domains have a one-to-one mapping to the same conceptual domain values they are isomorphic
Isomorphic domains can be harmonized
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
18
Domain Congruence
Two value domains are congruent if their value sets intersect and the size of the intersection meets or exceeds a predefined threshold
Example
FIPS 2-Character State Codes contain values for all US States
USPS 2-Character State Codes contain values for all US States as well as AA AE and AP postal codes for military delivery
Under certain circumstances the two domains can be harmonized
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
19
Encouraging a Culture of Semantic Harmony
Small variance in definitions in isolated functions become magnified when data is shared across functions
Establish a level playing ground by
Instituting a common business term glossary
Harmonizing business term definitions
Unifying shared reference data into conceptual domains and corresponding value domains
Socializing use of shared metadata
Establishing standards for future development
Integrate methods for monitoring compliance with standards
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
20
Check Out These Resources
wwwknowledge-integritycom
wwwdataqualitybookcom
If you have questions comments or suggestions please contact me
David Loshin
301-754-6350
loshinknowledge-integritycom
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301) 754-6350
21
Characteristics
A data element concept has values taken from a conceptual domain
One data element concept might be mapped to more than one instantiation as a data element
A data element has values taken from a value domain
A conceptual domain might be mapped to more than one instantiation as a value domain
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
8
More on Entities amp Characteristics
An entity may have many characteristics
A data domain may refer to many data element concepts
An instantiated container may have many data elements
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
9
Enumeration of Data Elements
Each data element concept is manifested as one or more data elements in a specific system
The template is used to map data element concepts to used data elements
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
10
Data Element Identifier
NameData
Element Concept
UsageConceptual
DomainValue
DomainStorage
Data TypePresentation
Data TypeUnit of
MeasureBusiness
Rules
20988Customer State State Salesforcecom CST VST-1 Char(2) Char(2) NA
May not be null
Business Terms and Data Element Concepts
Map use of a business term to a definition then to the entity or characteristic
Customer is used in reference to the customer entity
Account Number is used in reference to an attribute of a customer entity
Need to track list of data element concepts
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
11
Concept ID Concept Business Term Definition ID
16-A334License or Permit Holder Licensee BT-977
16-A334License or Permit Holder Permit Holder BT-983
Data Harmonization
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
12
Identify inconsistent definitions conflicts
in data domains format variations
Extract data definitions from current guidance documents etc and categorize definitions
using standard terminology
Integrate data elements into a single reference
source then combine common
data elements
Identify authoritative source
for definitionsAssign names using Naming Convention
Document in metadata Registry
Identify AnomaliesResolve and Standardize
Integrate amp CollateExtract amp Collect
Regu
latio
ns
Po
licie
s
Documents
Forms
Extract amp Collect
Collected metadata includes
Assigned identifier
Data element name
Related business terms
Definition
Data type
Length
Business rules
IssuesComments
Reference domain
Standard name
Authoritative sources
Lineage
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
13
Data Element
Type
FirstName VARCHAR(35)
LastName VARCHAR(40)
SSN CHAR(11)
Telephone VARCHAR(20)
Data Element
Type
First VARCHAR(25)
Middle VARCHAR(25)
Last VARCHAR(30)
SocialSec CHAR(9)
Understanding Reference Data - Assessment
copy 2013 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
14
Jean Montard 0 062672
Michael Evans 0 112168
Fran Peterson 1 030276
Pat Lawson 1 041779
J Montard M 062672
M Evans M 112168
F Peterson F 030276
P Lawson F 041779
bull Each of these data sets have matching records for unique individuals
bull Each has a code value with a 111 correspondence
bull Understand why the values differ in each data set
Jean J Montard F 062672
Michael D Evans F 112168
Fran S Peterson M 030276
Pat O Lawson M 041779
15
Integrate amp Collate
CORP-
QWODR-25
Quarterly Wage
Employee Wage
Amount
11 AN This field will contain the
information as provided from
the Quarterly Wage record
submitted for State Filing
PROPOSED State
Corporate
Quarterly Wage
OUTPUT DETAIL
RECORD
EMPR-385 WAGE AMOUNT The amount of a
personrsquos wages
during a
Reporting
Quarter
11 Signed
Numeric
00000000000 through
99999999999 The last two
positions are implied to be to
the right of the decimal point
Conditional for the following
output record Federal
Employee Locate Response
Record
Federal Match
System
ID Data Element
Name
Definition L T Data Element Rules IssuesCom
ments
Mapping Authoritative
Source
CORP-
QWODR-25
Quarterly Wage
Employee Wage
Amount
11 AN This field will contain the
information as provided from
the Quarterly Wage record
submitted for State Filing
PROPOSED State
Corporate
Quarterly Wage
OUTPUT DETAIL
RECORD
EMPR-385 WAGE AMOUNT The amount of a
personrsquos wages
during a
Reporting
Quarter
11 Signed
Numeric
00000000000 through
99999999999 The last two
positions are implied to be to
the right of the decimal point
Conditional for the following
output record Federal
Employee Locate Response
Record
Federal Match
System
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
Identify Anomalies
Inconsistency or ambiguity for similarly-named data elements
Inconsistency of explicit data element business rules
Incomplete or inconsistent reference value domains
Inconsistent formats
Conflicting data types
Abbreviations vs full names
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
16
ID Data Element
Name
Definition L T Data Element Rules Mapping
SWA-UI-
OD-15
Claimant State Lacking definition 2 AN If present this field will contain the Claimant
State code as provided on the submitted UI
record
State UI Output Detail
Record
IRS-DME-
15
Sex Lacking definition 1 AN Sex Code from the Person Table will contain
spaces if not present on the Person Table
Values not specified
IRS DATA MATCH
EXTRACT RECORD
FCR-69 Benefit Amount The monetary amount of Unemployment
Insurance benefits a person received during
a Reporting Period This definition
does not specify the Reporting
Period SWA specify quarter
[see below]
11 AN 00000000000 through 99999999999 The last
two positions are implied to be to
the right of the decimal point This field
will contain all zeroes when there is no Benefit
Amount or the information is not available
Conditional for the following output record bull
Federal Match Record
SWA-OD-
19
Benefit Amount This field will contain the gross amount of UI
benefits prior to any deductions paid to a
claimant during the reporting quarter as
provided on the UI record submitted to the
NDNH
11 AN Values are 00000000000 through 99999999999
without decimal
This field is whole dollars only
Potential conflict with FCR-69
SWA UI Output Detail
Record
Resolve amp Standardize
Identify authoritative
sources
Prioritize potential
harmonized definitions
Review with subject matter experts
Consolidate if possible
Differentiate if necessary
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
17
Isomorphic Domains
We can say that value domains A and B are isomorphic if
The cardinality of A is equal to the cardinality of B (they have the same number of values)
Both A and B are associated with a conceptual domain C with the same cardinality as A and B
There is an enumeration of value meanings in A that is a 1-1 mapping to concepts in C
There is an enumeration of value meanings in B that is a 1-1 mapping to concepts in C
In other words if two value domains have a one-to-one mapping to the same conceptual domain values they are isomorphic
Isomorphic domains can be harmonized
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
18
Domain Congruence
Two value domains are congruent if their value sets intersect and the size of the intersection meets or exceeds a predefined threshold
Example
FIPS 2-Character State Codes contain values for all US States
USPS 2-Character State Codes contain values for all US States as well as AA AE and AP postal codes for military delivery
Under certain circumstances the two domains can be harmonized
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
19
Encouraging a Culture of Semantic Harmony
Small variance in definitions in isolated functions become magnified when data is shared across functions
Establish a level playing ground by
Instituting a common business term glossary
Harmonizing business term definitions
Unifying shared reference data into conceptual domains and corresponding value domains
Socializing use of shared metadata
Establishing standards for future development
Integrate methods for monitoring compliance with standards
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
20
Check Out These Resources
wwwknowledge-integritycom
wwwdataqualitybookcom
If you have questions comments or suggestions please contact me
David Loshin
301-754-6350
loshinknowledge-integritycom
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301) 754-6350
21
More on Entities amp Characteristics
An entity may have many characteristics
A data domain may refer to many data element concepts
An instantiated container may have many data elements
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
9
Enumeration of Data Elements
Each data element concept is manifested as one or more data elements in a specific system
The template is used to map data element concepts to used data elements
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
10
Data Element Identifier
NameData
Element Concept
UsageConceptual
DomainValue
DomainStorage
Data TypePresentation
Data TypeUnit of
MeasureBusiness
Rules
20988Customer State State Salesforcecom CST VST-1 Char(2) Char(2) NA
May not be null
Business Terms and Data Element Concepts
Map use of a business term to a definition then to the entity or characteristic
Customer is used in reference to the customer entity
Account Number is used in reference to an attribute of a customer entity
Need to track list of data element concepts
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
11
Concept ID Concept Business Term Definition ID
16-A334License or Permit Holder Licensee BT-977
16-A334License or Permit Holder Permit Holder BT-983
Data Harmonization
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
12
Identify inconsistent definitions conflicts
in data domains format variations
Extract data definitions from current guidance documents etc and categorize definitions
using standard terminology
Integrate data elements into a single reference
source then combine common
data elements
Identify authoritative source
for definitionsAssign names using Naming Convention
Document in metadata Registry
Identify AnomaliesResolve and Standardize
Integrate amp CollateExtract amp Collect
Regu
latio
ns
Po
licie
s
Documents
Forms
Extract amp Collect
Collected metadata includes
Assigned identifier
Data element name
Related business terms
Definition
Data type
Length
Business rules
IssuesComments
Reference domain
Standard name
Authoritative sources
Lineage
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
13
Data Element
Type
FirstName VARCHAR(35)
LastName VARCHAR(40)
SSN CHAR(11)
Telephone VARCHAR(20)
Data Element
Type
First VARCHAR(25)
Middle VARCHAR(25)
Last VARCHAR(30)
SocialSec CHAR(9)
Understanding Reference Data - Assessment
copy 2013 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
14
Jean Montard 0 062672
Michael Evans 0 112168
Fran Peterson 1 030276
Pat Lawson 1 041779
J Montard M 062672
M Evans M 112168
F Peterson F 030276
P Lawson F 041779
bull Each of these data sets have matching records for unique individuals
bull Each has a code value with a 111 correspondence
bull Understand why the values differ in each data set
Jean J Montard F 062672
Michael D Evans F 112168
Fran S Peterson M 030276
Pat O Lawson M 041779
15
Integrate amp Collate
CORP-
QWODR-25
Quarterly Wage
Employee Wage
Amount
11 AN This field will contain the
information as provided from
the Quarterly Wage record
submitted for State Filing
PROPOSED State
Corporate
Quarterly Wage
OUTPUT DETAIL
RECORD
EMPR-385 WAGE AMOUNT The amount of a
personrsquos wages
during a
Reporting
Quarter
11 Signed
Numeric
00000000000 through
99999999999 The last two
positions are implied to be to
the right of the decimal point
Conditional for the following
output record Federal
Employee Locate Response
Record
Federal Match
System
ID Data Element
Name
Definition L T Data Element Rules IssuesCom
ments
Mapping Authoritative
Source
CORP-
QWODR-25
Quarterly Wage
Employee Wage
Amount
11 AN This field will contain the
information as provided from
the Quarterly Wage record
submitted for State Filing
PROPOSED State
Corporate
Quarterly Wage
OUTPUT DETAIL
RECORD
EMPR-385 WAGE AMOUNT The amount of a
personrsquos wages
during a
Reporting
Quarter
11 Signed
Numeric
00000000000 through
99999999999 The last two
positions are implied to be to
the right of the decimal point
Conditional for the following
output record Federal
Employee Locate Response
Record
Federal Match
System
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
Identify Anomalies
Inconsistency or ambiguity for similarly-named data elements
Inconsistency of explicit data element business rules
Incomplete or inconsistent reference value domains
Inconsistent formats
Conflicting data types
Abbreviations vs full names
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
16
ID Data Element
Name
Definition L T Data Element Rules Mapping
SWA-UI-
OD-15
Claimant State Lacking definition 2 AN If present this field will contain the Claimant
State code as provided on the submitted UI
record
State UI Output Detail
Record
IRS-DME-
15
Sex Lacking definition 1 AN Sex Code from the Person Table will contain
spaces if not present on the Person Table
Values not specified
IRS DATA MATCH
EXTRACT RECORD
FCR-69 Benefit Amount The monetary amount of Unemployment
Insurance benefits a person received during
a Reporting Period This definition
does not specify the Reporting
Period SWA specify quarter
[see below]
11 AN 00000000000 through 99999999999 The last
two positions are implied to be to
the right of the decimal point This field
will contain all zeroes when there is no Benefit
Amount or the information is not available
Conditional for the following output record bull
Federal Match Record
SWA-OD-
19
Benefit Amount This field will contain the gross amount of UI
benefits prior to any deductions paid to a
claimant during the reporting quarter as
provided on the UI record submitted to the
NDNH
11 AN Values are 00000000000 through 99999999999
without decimal
This field is whole dollars only
Potential conflict with FCR-69
SWA UI Output Detail
Record
Resolve amp Standardize
Identify authoritative
sources
Prioritize potential
harmonized definitions
Review with subject matter experts
Consolidate if possible
Differentiate if necessary
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
17
Isomorphic Domains
We can say that value domains A and B are isomorphic if
The cardinality of A is equal to the cardinality of B (they have the same number of values)
Both A and B are associated with a conceptual domain C with the same cardinality as A and B
There is an enumeration of value meanings in A that is a 1-1 mapping to concepts in C
There is an enumeration of value meanings in B that is a 1-1 mapping to concepts in C
In other words if two value domains have a one-to-one mapping to the same conceptual domain values they are isomorphic
Isomorphic domains can be harmonized
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
18
Domain Congruence
Two value domains are congruent if their value sets intersect and the size of the intersection meets or exceeds a predefined threshold
Example
FIPS 2-Character State Codes contain values for all US States
USPS 2-Character State Codes contain values for all US States as well as AA AE and AP postal codes for military delivery
Under certain circumstances the two domains can be harmonized
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
19
Encouraging a Culture of Semantic Harmony
Small variance in definitions in isolated functions become magnified when data is shared across functions
Establish a level playing ground by
Instituting a common business term glossary
Harmonizing business term definitions
Unifying shared reference data into conceptual domains and corresponding value domains
Socializing use of shared metadata
Establishing standards for future development
Integrate methods for monitoring compliance with standards
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
20
Check Out These Resources
wwwknowledge-integritycom
wwwdataqualitybookcom
If you have questions comments or suggestions please contact me
David Loshin
301-754-6350
loshinknowledge-integritycom
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301) 754-6350
21
Enumeration of Data Elements
Each data element concept is manifested as one or more data elements in a specific system
The template is used to map data element concepts to used data elements
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
10
Data Element Identifier
NameData
Element Concept
UsageConceptual
DomainValue
DomainStorage
Data TypePresentation
Data TypeUnit of
MeasureBusiness
Rules
20988Customer State State Salesforcecom CST VST-1 Char(2) Char(2) NA
May not be null
Business Terms and Data Element Concepts
Map use of a business term to a definition then to the entity or characteristic
Customer is used in reference to the customer entity
Account Number is used in reference to an attribute of a customer entity
Need to track list of data element concepts
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
11
Concept ID Concept Business Term Definition ID
16-A334License or Permit Holder Licensee BT-977
16-A334License or Permit Holder Permit Holder BT-983
Data Harmonization
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
12
Identify inconsistent definitions conflicts
in data domains format variations
Extract data definitions from current guidance documents etc and categorize definitions
using standard terminology
Integrate data elements into a single reference
source then combine common
data elements
Identify authoritative source
for definitionsAssign names using Naming Convention
Document in metadata Registry
Identify AnomaliesResolve and Standardize
Integrate amp CollateExtract amp Collect
Regu
latio
ns
Po
licie
s
Documents
Forms
Extract amp Collect
Collected metadata includes
Assigned identifier
Data element name
Related business terms
Definition
Data type
Length
Business rules
IssuesComments
Reference domain
Standard name
Authoritative sources
Lineage
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
13
Data Element
Type
FirstName VARCHAR(35)
LastName VARCHAR(40)
SSN CHAR(11)
Telephone VARCHAR(20)
Data Element
Type
First VARCHAR(25)
Middle VARCHAR(25)
Last VARCHAR(30)
SocialSec CHAR(9)
Understanding Reference Data - Assessment
copy 2013 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
14
Jean Montard 0 062672
Michael Evans 0 112168
Fran Peterson 1 030276
Pat Lawson 1 041779
J Montard M 062672
M Evans M 112168
F Peterson F 030276
P Lawson F 041779
bull Each of these data sets have matching records for unique individuals
bull Each has a code value with a 111 correspondence
bull Understand why the values differ in each data set
Jean J Montard F 062672
Michael D Evans F 112168
Fran S Peterson M 030276
Pat O Lawson M 041779
15
Integrate amp Collate
CORP-
QWODR-25
Quarterly Wage
Employee Wage
Amount
11 AN This field will contain the
information as provided from
the Quarterly Wage record
submitted for State Filing
PROPOSED State
Corporate
Quarterly Wage
OUTPUT DETAIL
RECORD
EMPR-385 WAGE AMOUNT The amount of a
personrsquos wages
during a
Reporting
Quarter
11 Signed
Numeric
00000000000 through
99999999999 The last two
positions are implied to be to
the right of the decimal point
Conditional for the following
output record Federal
Employee Locate Response
Record
Federal Match
System
ID Data Element
Name
Definition L T Data Element Rules IssuesCom
ments
Mapping Authoritative
Source
CORP-
QWODR-25
Quarterly Wage
Employee Wage
Amount
11 AN This field will contain the
information as provided from
the Quarterly Wage record
submitted for State Filing
PROPOSED State
Corporate
Quarterly Wage
OUTPUT DETAIL
RECORD
EMPR-385 WAGE AMOUNT The amount of a
personrsquos wages
during a
Reporting
Quarter
11 Signed
Numeric
00000000000 through
99999999999 The last two
positions are implied to be to
the right of the decimal point
Conditional for the following
output record Federal
Employee Locate Response
Record
Federal Match
System
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
Identify Anomalies
Inconsistency or ambiguity for similarly-named data elements
Inconsistency of explicit data element business rules
Incomplete or inconsistent reference value domains
Inconsistent formats
Conflicting data types
Abbreviations vs full names
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
16
ID Data Element
Name
Definition L T Data Element Rules Mapping
SWA-UI-
OD-15
Claimant State Lacking definition 2 AN If present this field will contain the Claimant
State code as provided on the submitted UI
record
State UI Output Detail
Record
IRS-DME-
15
Sex Lacking definition 1 AN Sex Code from the Person Table will contain
spaces if not present on the Person Table
Values not specified
IRS DATA MATCH
EXTRACT RECORD
FCR-69 Benefit Amount The monetary amount of Unemployment
Insurance benefits a person received during
a Reporting Period This definition
does not specify the Reporting
Period SWA specify quarter
[see below]
11 AN 00000000000 through 99999999999 The last
two positions are implied to be to
the right of the decimal point This field
will contain all zeroes when there is no Benefit
Amount or the information is not available
Conditional for the following output record bull
Federal Match Record
SWA-OD-
19
Benefit Amount This field will contain the gross amount of UI
benefits prior to any deductions paid to a
claimant during the reporting quarter as
provided on the UI record submitted to the
NDNH
11 AN Values are 00000000000 through 99999999999
without decimal
This field is whole dollars only
Potential conflict with FCR-69
SWA UI Output Detail
Record
Resolve amp Standardize
Identify authoritative
sources
Prioritize potential
harmonized definitions
Review with subject matter experts
Consolidate if possible
Differentiate if necessary
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
17
Isomorphic Domains
We can say that value domains A and B are isomorphic if
The cardinality of A is equal to the cardinality of B (they have the same number of values)
Both A and B are associated with a conceptual domain C with the same cardinality as A and B
There is an enumeration of value meanings in A that is a 1-1 mapping to concepts in C
There is an enumeration of value meanings in B that is a 1-1 mapping to concepts in C
In other words if two value domains have a one-to-one mapping to the same conceptual domain values they are isomorphic
Isomorphic domains can be harmonized
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
18
Domain Congruence
Two value domains are congruent if their value sets intersect and the size of the intersection meets or exceeds a predefined threshold
Example
FIPS 2-Character State Codes contain values for all US States
USPS 2-Character State Codes contain values for all US States as well as AA AE and AP postal codes for military delivery
Under certain circumstances the two domains can be harmonized
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
19
Encouraging a Culture of Semantic Harmony
Small variance in definitions in isolated functions become magnified when data is shared across functions
Establish a level playing ground by
Instituting a common business term glossary
Harmonizing business term definitions
Unifying shared reference data into conceptual domains and corresponding value domains
Socializing use of shared metadata
Establishing standards for future development
Integrate methods for monitoring compliance with standards
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
20
Check Out These Resources
wwwknowledge-integritycom
wwwdataqualitybookcom
If you have questions comments or suggestions please contact me
David Loshin
301-754-6350
loshinknowledge-integritycom
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301) 754-6350
21
Business Terms and Data Element Concepts
Map use of a business term to a definition then to the entity or characteristic
Customer is used in reference to the customer entity
Account Number is used in reference to an attribute of a customer entity
Need to track list of data element concepts
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
11
Concept ID Concept Business Term Definition ID
16-A334License or Permit Holder Licensee BT-977
16-A334License or Permit Holder Permit Holder BT-983
Data Harmonization
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
12
Identify inconsistent definitions conflicts
in data domains format variations
Extract data definitions from current guidance documents etc and categorize definitions
using standard terminology
Integrate data elements into a single reference
source then combine common
data elements
Identify authoritative source
for definitionsAssign names using Naming Convention
Document in metadata Registry
Identify AnomaliesResolve and Standardize
Integrate amp CollateExtract amp Collect
Regu
latio
ns
Po
licie
s
Documents
Forms
Extract amp Collect
Collected metadata includes
Assigned identifier
Data element name
Related business terms
Definition
Data type
Length
Business rules
IssuesComments
Reference domain
Standard name
Authoritative sources
Lineage
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
13
Data Element
Type
FirstName VARCHAR(35)
LastName VARCHAR(40)
SSN CHAR(11)
Telephone VARCHAR(20)
Data Element
Type
First VARCHAR(25)
Middle VARCHAR(25)
Last VARCHAR(30)
SocialSec CHAR(9)
Understanding Reference Data - Assessment
copy 2013 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
14
Jean Montard 0 062672
Michael Evans 0 112168
Fran Peterson 1 030276
Pat Lawson 1 041779
J Montard M 062672
M Evans M 112168
F Peterson F 030276
P Lawson F 041779
bull Each of these data sets have matching records for unique individuals
bull Each has a code value with a 111 correspondence
bull Understand why the values differ in each data set
Jean J Montard F 062672
Michael D Evans F 112168
Fran S Peterson M 030276
Pat O Lawson M 041779
15
Integrate amp Collate
CORP-
QWODR-25
Quarterly Wage
Employee Wage
Amount
11 AN This field will contain the
information as provided from
the Quarterly Wage record
submitted for State Filing
PROPOSED State
Corporate
Quarterly Wage
OUTPUT DETAIL
RECORD
EMPR-385 WAGE AMOUNT The amount of a
personrsquos wages
during a
Reporting
Quarter
11 Signed
Numeric
00000000000 through
99999999999 The last two
positions are implied to be to
the right of the decimal point
Conditional for the following
output record Federal
Employee Locate Response
Record
Federal Match
System
ID Data Element
Name
Definition L T Data Element Rules IssuesCom
ments
Mapping Authoritative
Source
CORP-
QWODR-25
Quarterly Wage
Employee Wage
Amount
11 AN This field will contain the
information as provided from
the Quarterly Wage record
submitted for State Filing
PROPOSED State
Corporate
Quarterly Wage
OUTPUT DETAIL
RECORD
EMPR-385 WAGE AMOUNT The amount of a
personrsquos wages
during a
Reporting
Quarter
11 Signed
Numeric
00000000000 through
99999999999 The last two
positions are implied to be to
the right of the decimal point
Conditional for the following
output record Federal
Employee Locate Response
Record
Federal Match
System
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
Identify Anomalies
Inconsistency or ambiguity for similarly-named data elements
Inconsistency of explicit data element business rules
Incomplete or inconsistent reference value domains
Inconsistent formats
Conflicting data types
Abbreviations vs full names
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
16
ID Data Element
Name
Definition L T Data Element Rules Mapping
SWA-UI-
OD-15
Claimant State Lacking definition 2 AN If present this field will contain the Claimant
State code as provided on the submitted UI
record
State UI Output Detail
Record
IRS-DME-
15
Sex Lacking definition 1 AN Sex Code from the Person Table will contain
spaces if not present on the Person Table
Values not specified
IRS DATA MATCH
EXTRACT RECORD
FCR-69 Benefit Amount The monetary amount of Unemployment
Insurance benefits a person received during
a Reporting Period This definition
does not specify the Reporting
Period SWA specify quarter
[see below]
11 AN 00000000000 through 99999999999 The last
two positions are implied to be to
the right of the decimal point This field
will contain all zeroes when there is no Benefit
Amount or the information is not available
Conditional for the following output record bull
Federal Match Record
SWA-OD-
19
Benefit Amount This field will contain the gross amount of UI
benefits prior to any deductions paid to a
claimant during the reporting quarter as
provided on the UI record submitted to the
NDNH
11 AN Values are 00000000000 through 99999999999
without decimal
This field is whole dollars only
Potential conflict with FCR-69
SWA UI Output Detail
Record
Resolve amp Standardize
Identify authoritative
sources
Prioritize potential
harmonized definitions
Review with subject matter experts
Consolidate if possible
Differentiate if necessary
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
17
Isomorphic Domains
We can say that value domains A and B are isomorphic if
The cardinality of A is equal to the cardinality of B (they have the same number of values)
Both A and B are associated with a conceptual domain C with the same cardinality as A and B
There is an enumeration of value meanings in A that is a 1-1 mapping to concepts in C
There is an enumeration of value meanings in B that is a 1-1 mapping to concepts in C
In other words if two value domains have a one-to-one mapping to the same conceptual domain values they are isomorphic
Isomorphic domains can be harmonized
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
18
Domain Congruence
Two value domains are congruent if their value sets intersect and the size of the intersection meets or exceeds a predefined threshold
Example
FIPS 2-Character State Codes contain values for all US States
USPS 2-Character State Codes contain values for all US States as well as AA AE and AP postal codes for military delivery
Under certain circumstances the two domains can be harmonized
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
19
Encouraging a Culture of Semantic Harmony
Small variance in definitions in isolated functions become magnified when data is shared across functions
Establish a level playing ground by
Instituting a common business term glossary
Harmonizing business term definitions
Unifying shared reference data into conceptual domains and corresponding value domains
Socializing use of shared metadata
Establishing standards for future development
Integrate methods for monitoring compliance with standards
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
20
Check Out These Resources
wwwknowledge-integritycom
wwwdataqualitybookcom
If you have questions comments or suggestions please contact me
David Loshin
301-754-6350
loshinknowledge-integritycom
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301) 754-6350
21
Data Harmonization
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
12
Identify inconsistent definitions conflicts
in data domains format variations
Extract data definitions from current guidance documents etc and categorize definitions
using standard terminology
Integrate data elements into a single reference
source then combine common
data elements
Identify authoritative source
for definitionsAssign names using Naming Convention
Document in metadata Registry
Identify AnomaliesResolve and Standardize
Integrate amp CollateExtract amp Collect
Regu
latio
ns
Po
licie
s
Documents
Forms
Extract amp Collect
Collected metadata includes
Assigned identifier
Data element name
Related business terms
Definition
Data type
Length
Business rules
IssuesComments
Reference domain
Standard name
Authoritative sources
Lineage
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
13
Data Element
Type
FirstName VARCHAR(35)
LastName VARCHAR(40)
SSN CHAR(11)
Telephone VARCHAR(20)
Data Element
Type
First VARCHAR(25)
Middle VARCHAR(25)
Last VARCHAR(30)
SocialSec CHAR(9)
Understanding Reference Data - Assessment
copy 2013 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
14
Jean Montard 0 062672
Michael Evans 0 112168
Fran Peterson 1 030276
Pat Lawson 1 041779
J Montard M 062672
M Evans M 112168
F Peterson F 030276
P Lawson F 041779
bull Each of these data sets have matching records for unique individuals
bull Each has a code value with a 111 correspondence
bull Understand why the values differ in each data set
Jean J Montard F 062672
Michael D Evans F 112168
Fran S Peterson M 030276
Pat O Lawson M 041779
15
Integrate amp Collate
CORP-
QWODR-25
Quarterly Wage
Employee Wage
Amount
11 AN This field will contain the
information as provided from
the Quarterly Wage record
submitted for State Filing
PROPOSED State
Corporate
Quarterly Wage
OUTPUT DETAIL
RECORD
EMPR-385 WAGE AMOUNT The amount of a
personrsquos wages
during a
Reporting
Quarter
11 Signed
Numeric
00000000000 through
99999999999 The last two
positions are implied to be to
the right of the decimal point
Conditional for the following
output record Federal
Employee Locate Response
Record
Federal Match
System
ID Data Element
Name
Definition L T Data Element Rules IssuesCom
ments
Mapping Authoritative
Source
CORP-
QWODR-25
Quarterly Wage
Employee Wage
Amount
11 AN This field will contain the
information as provided from
the Quarterly Wage record
submitted for State Filing
PROPOSED State
Corporate
Quarterly Wage
OUTPUT DETAIL
RECORD
EMPR-385 WAGE AMOUNT The amount of a
personrsquos wages
during a
Reporting
Quarter
11 Signed
Numeric
00000000000 through
99999999999 The last two
positions are implied to be to
the right of the decimal point
Conditional for the following
output record Federal
Employee Locate Response
Record
Federal Match
System
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
Identify Anomalies
Inconsistency or ambiguity for similarly-named data elements
Inconsistency of explicit data element business rules
Incomplete or inconsistent reference value domains
Inconsistent formats
Conflicting data types
Abbreviations vs full names
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
16
ID Data Element
Name
Definition L T Data Element Rules Mapping
SWA-UI-
OD-15
Claimant State Lacking definition 2 AN If present this field will contain the Claimant
State code as provided on the submitted UI
record
State UI Output Detail
Record
IRS-DME-
15
Sex Lacking definition 1 AN Sex Code from the Person Table will contain
spaces if not present on the Person Table
Values not specified
IRS DATA MATCH
EXTRACT RECORD
FCR-69 Benefit Amount The monetary amount of Unemployment
Insurance benefits a person received during
a Reporting Period This definition
does not specify the Reporting
Period SWA specify quarter
[see below]
11 AN 00000000000 through 99999999999 The last
two positions are implied to be to
the right of the decimal point This field
will contain all zeroes when there is no Benefit
Amount or the information is not available
Conditional for the following output record bull
Federal Match Record
SWA-OD-
19
Benefit Amount This field will contain the gross amount of UI
benefits prior to any deductions paid to a
claimant during the reporting quarter as
provided on the UI record submitted to the
NDNH
11 AN Values are 00000000000 through 99999999999
without decimal
This field is whole dollars only
Potential conflict with FCR-69
SWA UI Output Detail
Record
Resolve amp Standardize
Identify authoritative
sources
Prioritize potential
harmonized definitions
Review with subject matter experts
Consolidate if possible
Differentiate if necessary
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
17
Isomorphic Domains
We can say that value domains A and B are isomorphic if
The cardinality of A is equal to the cardinality of B (they have the same number of values)
Both A and B are associated with a conceptual domain C with the same cardinality as A and B
There is an enumeration of value meanings in A that is a 1-1 mapping to concepts in C
There is an enumeration of value meanings in B that is a 1-1 mapping to concepts in C
In other words if two value domains have a one-to-one mapping to the same conceptual domain values they are isomorphic
Isomorphic domains can be harmonized
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
18
Domain Congruence
Two value domains are congruent if their value sets intersect and the size of the intersection meets or exceeds a predefined threshold
Example
FIPS 2-Character State Codes contain values for all US States
USPS 2-Character State Codes contain values for all US States as well as AA AE and AP postal codes for military delivery
Under certain circumstances the two domains can be harmonized
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
19
Encouraging a Culture of Semantic Harmony
Small variance in definitions in isolated functions become magnified when data is shared across functions
Establish a level playing ground by
Instituting a common business term glossary
Harmonizing business term definitions
Unifying shared reference data into conceptual domains and corresponding value domains
Socializing use of shared metadata
Establishing standards for future development
Integrate methods for monitoring compliance with standards
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
20
Check Out These Resources
wwwknowledge-integritycom
wwwdataqualitybookcom
If you have questions comments or suggestions please contact me
David Loshin
301-754-6350
loshinknowledge-integritycom
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301) 754-6350
21
Extract amp Collect
Collected metadata includes
Assigned identifier
Data element name
Related business terms
Definition
Data type
Length
Business rules
IssuesComments
Reference domain
Standard name
Authoritative sources
Lineage
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
13
Data Element
Type
FirstName VARCHAR(35)
LastName VARCHAR(40)
SSN CHAR(11)
Telephone VARCHAR(20)
Data Element
Type
First VARCHAR(25)
Middle VARCHAR(25)
Last VARCHAR(30)
SocialSec CHAR(9)
Understanding Reference Data - Assessment
copy 2013 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
14
Jean Montard 0 062672
Michael Evans 0 112168
Fran Peterson 1 030276
Pat Lawson 1 041779
J Montard M 062672
M Evans M 112168
F Peterson F 030276
P Lawson F 041779
bull Each of these data sets have matching records for unique individuals
bull Each has a code value with a 111 correspondence
bull Understand why the values differ in each data set
Jean J Montard F 062672
Michael D Evans F 112168
Fran S Peterson M 030276
Pat O Lawson M 041779
15
Integrate amp Collate
CORP-
QWODR-25
Quarterly Wage
Employee Wage
Amount
11 AN This field will contain the
information as provided from
the Quarterly Wage record
submitted for State Filing
PROPOSED State
Corporate
Quarterly Wage
OUTPUT DETAIL
RECORD
EMPR-385 WAGE AMOUNT The amount of a
personrsquos wages
during a
Reporting
Quarter
11 Signed
Numeric
00000000000 through
99999999999 The last two
positions are implied to be to
the right of the decimal point
Conditional for the following
output record Federal
Employee Locate Response
Record
Federal Match
System
ID Data Element
Name
Definition L T Data Element Rules IssuesCom
ments
Mapping Authoritative
Source
CORP-
QWODR-25
Quarterly Wage
Employee Wage
Amount
11 AN This field will contain the
information as provided from
the Quarterly Wage record
submitted for State Filing
PROPOSED State
Corporate
Quarterly Wage
OUTPUT DETAIL
RECORD
EMPR-385 WAGE AMOUNT The amount of a
personrsquos wages
during a
Reporting
Quarter
11 Signed
Numeric
00000000000 through
99999999999 The last two
positions are implied to be to
the right of the decimal point
Conditional for the following
output record Federal
Employee Locate Response
Record
Federal Match
System
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
Identify Anomalies
Inconsistency or ambiguity for similarly-named data elements
Inconsistency of explicit data element business rules
Incomplete or inconsistent reference value domains
Inconsistent formats
Conflicting data types
Abbreviations vs full names
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
16
ID Data Element
Name
Definition L T Data Element Rules Mapping
SWA-UI-
OD-15
Claimant State Lacking definition 2 AN If present this field will contain the Claimant
State code as provided on the submitted UI
record
State UI Output Detail
Record
IRS-DME-
15
Sex Lacking definition 1 AN Sex Code from the Person Table will contain
spaces if not present on the Person Table
Values not specified
IRS DATA MATCH
EXTRACT RECORD
FCR-69 Benefit Amount The monetary amount of Unemployment
Insurance benefits a person received during
a Reporting Period This definition
does not specify the Reporting
Period SWA specify quarter
[see below]
11 AN 00000000000 through 99999999999 The last
two positions are implied to be to
the right of the decimal point This field
will contain all zeroes when there is no Benefit
Amount or the information is not available
Conditional for the following output record bull
Federal Match Record
SWA-OD-
19
Benefit Amount This field will contain the gross amount of UI
benefits prior to any deductions paid to a
claimant during the reporting quarter as
provided on the UI record submitted to the
NDNH
11 AN Values are 00000000000 through 99999999999
without decimal
This field is whole dollars only
Potential conflict with FCR-69
SWA UI Output Detail
Record
Resolve amp Standardize
Identify authoritative
sources
Prioritize potential
harmonized definitions
Review with subject matter experts
Consolidate if possible
Differentiate if necessary
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
17
Isomorphic Domains
We can say that value domains A and B are isomorphic if
The cardinality of A is equal to the cardinality of B (they have the same number of values)
Both A and B are associated with a conceptual domain C with the same cardinality as A and B
There is an enumeration of value meanings in A that is a 1-1 mapping to concepts in C
There is an enumeration of value meanings in B that is a 1-1 mapping to concepts in C
In other words if two value domains have a one-to-one mapping to the same conceptual domain values they are isomorphic
Isomorphic domains can be harmonized
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
18
Domain Congruence
Two value domains are congruent if their value sets intersect and the size of the intersection meets or exceeds a predefined threshold
Example
FIPS 2-Character State Codes contain values for all US States
USPS 2-Character State Codes contain values for all US States as well as AA AE and AP postal codes for military delivery
Under certain circumstances the two domains can be harmonized
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
19
Encouraging a Culture of Semantic Harmony
Small variance in definitions in isolated functions become magnified when data is shared across functions
Establish a level playing ground by
Instituting a common business term glossary
Harmonizing business term definitions
Unifying shared reference data into conceptual domains and corresponding value domains
Socializing use of shared metadata
Establishing standards for future development
Integrate methods for monitoring compliance with standards
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
20
Check Out These Resources
wwwknowledge-integritycom
wwwdataqualitybookcom
If you have questions comments or suggestions please contact me
David Loshin
301-754-6350
loshinknowledge-integritycom
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301) 754-6350
21
Understanding Reference Data - Assessment
copy 2013 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
14
Jean Montard 0 062672
Michael Evans 0 112168
Fran Peterson 1 030276
Pat Lawson 1 041779
J Montard M 062672
M Evans M 112168
F Peterson F 030276
P Lawson F 041779
bull Each of these data sets have matching records for unique individuals
bull Each has a code value with a 111 correspondence
bull Understand why the values differ in each data set
Jean J Montard F 062672
Michael D Evans F 112168
Fran S Peterson M 030276
Pat O Lawson M 041779
15
Integrate amp Collate
CORP-
QWODR-25
Quarterly Wage
Employee Wage
Amount
11 AN This field will contain the
information as provided from
the Quarterly Wage record
submitted for State Filing
PROPOSED State
Corporate
Quarterly Wage
OUTPUT DETAIL
RECORD
EMPR-385 WAGE AMOUNT The amount of a
personrsquos wages
during a
Reporting
Quarter
11 Signed
Numeric
00000000000 through
99999999999 The last two
positions are implied to be to
the right of the decimal point
Conditional for the following
output record Federal
Employee Locate Response
Record
Federal Match
System
ID Data Element
Name
Definition L T Data Element Rules IssuesCom
ments
Mapping Authoritative
Source
CORP-
QWODR-25
Quarterly Wage
Employee Wage
Amount
11 AN This field will contain the
information as provided from
the Quarterly Wage record
submitted for State Filing
PROPOSED State
Corporate
Quarterly Wage
OUTPUT DETAIL
RECORD
EMPR-385 WAGE AMOUNT The amount of a
personrsquos wages
during a
Reporting
Quarter
11 Signed
Numeric
00000000000 through
99999999999 The last two
positions are implied to be to
the right of the decimal point
Conditional for the following
output record Federal
Employee Locate Response
Record
Federal Match
System
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
Identify Anomalies
Inconsistency or ambiguity for similarly-named data elements
Inconsistency of explicit data element business rules
Incomplete or inconsistent reference value domains
Inconsistent formats
Conflicting data types
Abbreviations vs full names
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
16
ID Data Element
Name
Definition L T Data Element Rules Mapping
SWA-UI-
OD-15
Claimant State Lacking definition 2 AN If present this field will contain the Claimant
State code as provided on the submitted UI
record
State UI Output Detail
Record
IRS-DME-
15
Sex Lacking definition 1 AN Sex Code from the Person Table will contain
spaces if not present on the Person Table
Values not specified
IRS DATA MATCH
EXTRACT RECORD
FCR-69 Benefit Amount The monetary amount of Unemployment
Insurance benefits a person received during
a Reporting Period This definition
does not specify the Reporting
Period SWA specify quarter
[see below]
11 AN 00000000000 through 99999999999 The last
two positions are implied to be to
the right of the decimal point This field
will contain all zeroes when there is no Benefit
Amount or the information is not available
Conditional for the following output record bull
Federal Match Record
SWA-OD-
19
Benefit Amount This field will contain the gross amount of UI
benefits prior to any deductions paid to a
claimant during the reporting quarter as
provided on the UI record submitted to the
NDNH
11 AN Values are 00000000000 through 99999999999
without decimal
This field is whole dollars only
Potential conflict with FCR-69
SWA UI Output Detail
Record
Resolve amp Standardize
Identify authoritative
sources
Prioritize potential
harmonized definitions
Review with subject matter experts
Consolidate if possible
Differentiate if necessary
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
17
Isomorphic Domains
We can say that value domains A and B are isomorphic if
The cardinality of A is equal to the cardinality of B (they have the same number of values)
Both A and B are associated with a conceptual domain C with the same cardinality as A and B
There is an enumeration of value meanings in A that is a 1-1 mapping to concepts in C
There is an enumeration of value meanings in B that is a 1-1 mapping to concepts in C
In other words if two value domains have a one-to-one mapping to the same conceptual domain values they are isomorphic
Isomorphic domains can be harmonized
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
18
Domain Congruence
Two value domains are congruent if their value sets intersect and the size of the intersection meets or exceeds a predefined threshold
Example
FIPS 2-Character State Codes contain values for all US States
USPS 2-Character State Codes contain values for all US States as well as AA AE and AP postal codes for military delivery
Under certain circumstances the two domains can be harmonized
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
19
Encouraging a Culture of Semantic Harmony
Small variance in definitions in isolated functions become magnified when data is shared across functions
Establish a level playing ground by
Instituting a common business term glossary
Harmonizing business term definitions
Unifying shared reference data into conceptual domains and corresponding value domains
Socializing use of shared metadata
Establishing standards for future development
Integrate methods for monitoring compliance with standards
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
20
Check Out These Resources
wwwknowledge-integritycom
wwwdataqualitybookcom
If you have questions comments or suggestions please contact me
David Loshin
301-754-6350
loshinknowledge-integritycom
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301) 754-6350
21
15
Integrate amp Collate
CORP-
QWODR-25
Quarterly Wage
Employee Wage
Amount
11 AN This field will contain the
information as provided from
the Quarterly Wage record
submitted for State Filing
PROPOSED State
Corporate
Quarterly Wage
OUTPUT DETAIL
RECORD
EMPR-385 WAGE AMOUNT The amount of a
personrsquos wages
during a
Reporting
Quarter
11 Signed
Numeric
00000000000 through
99999999999 The last two
positions are implied to be to
the right of the decimal point
Conditional for the following
output record Federal
Employee Locate Response
Record
Federal Match
System
ID Data Element
Name
Definition L T Data Element Rules IssuesCom
ments
Mapping Authoritative
Source
CORP-
QWODR-25
Quarterly Wage
Employee Wage
Amount
11 AN This field will contain the
information as provided from
the Quarterly Wage record
submitted for State Filing
PROPOSED State
Corporate
Quarterly Wage
OUTPUT DETAIL
RECORD
EMPR-385 WAGE AMOUNT The amount of a
personrsquos wages
during a
Reporting
Quarter
11 Signed
Numeric
00000000000 through
99999999999 The last two
positions are implied to be to
the right of the decimal point
Conditional for the following
output record Federal
Employee Locate Response
Record
Federal Match
System
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
Identify Anomalies
Inconsistency or ambiguity for similarly-named data elements
Inconsistency of explicit data element business rules
Incomplete or inconsistent reference value domains
Inconsistent formats
Conflicting data types
Abbreviations vs full names
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
16
ID Data Element
Name
Definition L T Data Element Rules Mapping
SWA-UI-
OD-15
Claimant State Lacking definition 2 AN If present this field will contain the Claimant
State code as provided on the submitted UI
record
State UI Output Detail
Record
IRS-DME-
15
Sex Lacking definition 1 AN Sex Code from the Person Table will contain
spaces if not present on the Person Table
Values not specified
IRS DATA MATCH
EXTRACT RECORD
FCR-69 Benefit Amount The monetary amount of Unemployment
Insurance benefits a person received during
a Reporting Period This definition
does not specify the Reporting
Period SWA specify quarter
[see below]
11 AN 00000000000 through 99999999999 The last
two positions are implied to be to
the right of the decimal point This field
will contain all zeroes when there is no Benefit
Amount or the information is not available
Conditional for the following output record bull
Federal Match Record
SWA-OD-
19
Benefit Amount This field will contain the gross amount of UI
benefits prior to any deductions paid to a
claimant during the reporting quarter as
provided on the UI record submitted to the
NDNH
11 AN Values are 00000000000 through 99999999999
without decimal
This field is whole dollars only
Potential conflict with FCR-69
SWA UI Output Detail
Record
Resolve amp Standardize
Identify authoritative
sources
Prioritize potential
harmonized definitions
Review with subject matter experts
Consolidate if possible
Differentiate if necessary
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
17
Isomorphic Domains
We can say that value domains A and B are isomorphic if
The cardinality of A is equal to the cardinality of B (they have the same number of values)
Both A and B are associated with a conceptual domain C with the same cardinality as A and B
There is an enumeration of value meanings in A that is a 1-1 mapping to concepts in C
There is an enumeration of value meanings in B that is a 1-1 mapping to concepts in C
In other words if two value domains have a one-to-one mapping to the same conceptual domain values they are isomorphic
Isomorphic domains can be harmonized
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
18
Domain Congruence
Two value domains are congruent if their value sets intersect and the size of the intersection meets or exceeds a predefined threshold
Example
FIPS 2-Character State Codes contain values for all US States
USPS 2-Character State Codes contain values for all US States as well as AA AE and AP postal codes for military delivery
Under certain circumstances the two domains can be harmonized
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
19
Encouraging a Culture of Semantic Harmony
Small variance in definitions in isolated functions become magnified when data is shared across functions
Establish a level playing ground by
Instituting a common business term glossary
Harmonizing business term definitions
Unifying shared reference data into conceptual domains and corresponding value domains
Socializing use of shared metadata
Establishing standards for future development
Integrate methods for monitoring compliance with standards
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
20
Check Out These Resources
wwwknowledge-integritycom
wwwdataqualitybookcom
If you have questions comments or suggestions please contact me
David Loshin
301-754-6350
loshinknowledge-integritycom
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301) 754-6350
21
Identify Anomalies
Inconsistency or ambiguity for similarly-named data elements
Inconsistency of explicit data element business rules
Incomplete or inconsistent reference value domains
Inconsistent formats
Conflicting data types
Abbreviations vs full names
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
16
ID Data Element
Name
Definition L T Data Element Rules Mapping
SWA-UI-
OD-15
Claimant State Lacking definition 2 AN If present this field will contain the Claimant
State code as provided on the submitted UI
record
State UI Output Detail
Record
IRS-DME-
15
Sex Lacking definition 1 AN Sex Code from the Person Table will contain
spaces if not present on the Person Table
Values not specified
IRS DATA MATCH
EXTRACT RECORD
FCR-69 Benefit Amount The monetary amount of Unemployment
Insurance benefits a person received during
a Reporting Period This definition
does not specify the Reporting
Period SWA specify quarter
[see below]
11 AN 00000000000 through 99999999999 The last
two positions are implied to be to
the right of the decimal point This field
will contain all zeroes when there is no Benefit
Amount or the information is not available
Conditional for the following output record bull
Federal Match Record
SWA-OD-
19
Benefit Amount This field will contain the gross amount of UI
benefits prior to any deductions paid to a
claimant during the reporting quarter as
provided on the UI record submitted to the
NDNH
11 AN Values are 00000000000 through 99999999999
without decimal
This field is whole dollars only
Potential conflict with FCR-69
SWA UI Output Detail
Record
Resolve amp Standardize
Identify authoritative
sources
Prioritize potential
harmonized definitions
Review with subject matter experts
Consolidate if possible
Differentiate if necessary
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
17
Isomorphic Domains
We can say that value domains A and B are isomorphic if
The cardinality of A is equal to the cardinality of B (they have the same number of values)
Both A and B are associated with a conceptual domain C with the same cardinality as A and B
There is an enumeration of value meanings in A that is a 1-1 mapping to concepts in C
There is an enumeration of value meanings in B that is a 1-1 mapping to concepts in C
In other words if two value domains have a one-to-one mapping to the same conceptual domain values they are isomorphic
Isomorphic domains can be harmonized
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
18
Domain Congruence
Two value domains are congruent if their value sets intersect and the size of the intersection meets or exceeds a predefined threshold
Example
FIPS 2-Character State Codes contain values for all US States
USPS 2-Character State Codes contain values for all US States as well as AA AE and AP postal codes for military delivery
Under certain circumstances the two domains can be harmonized
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
19
Encouraging a Culture of Semantic Harmony
Small variance in definitions in isolated functions become magnified when data is shared across functions
Establish a level playing ground by
Instituting a common business term glossary
Harmonizing business term definitions
Unifying shared reference data into conceptual domains and corresponding value domains
Socializing use of shared metadata
Establishing standards for future development
Integrate methods for monitoring compliance with standards
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
20
Check Out These Resources
wwwknowledge-integritycom
wwwdataqualitybookcom
If you have questions comments or suggestions please contact me
David Loshin
301-754-6350
loshinknowledge-integritycom
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301) 754-6350
21
Resolve amp Standardize
Identify authoritative
sources
Prioritize potential
harmonized definitions
Review with subject matter experts
Consolidate if possible
Differentiate if necessary
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
17
Isomorphic Domains
We can say that value domains A and B are isomorphic if
The cardinality of A is equal to the cardinality of B (they have the same number of values)
Both A and B are associated with a conceptual domain C with the same cardinality as A and B
There is an enumeration of value meanings in A that is a 1-1 mapping to concepts in C
There is an enumeration of value meanings in B that is a 1-1 mapping to concepts in C
In other words if two value domains have a one-to-one mapping to the same conceptual domain values they are isomorphic
Isomorphic domains can be harmonized
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
18
Domain Congruence
Two value domains are congruent if their value sets intersect and the size of the intersection meets or exceeds a predefined threshold
Example
FIPS 2-Character State Codes contain values for all US States
USPS 2-Character State Codes contain values for all US States as well as AA AE and AP postal codes for military delivery
Under certain circumstances the two domains can be harmonized
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
19
Encouraging a Culture of Semantic Harmony
Small variance in definitions in isolated functions become magnified when data is shared across functions
Establish a level playing ground by
Instituting a common business term glossary
Harmonizing business term definitions
Unifying shared reference data into conceptual domains and corresponding value domains
Socializing use of shared metadata
Establishing standards for future development
Integrate methods for monitoring compliance with standards
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
20
Check Out These Resources
wwwknowledge-integritycom
wwwdataqualitybookcom
If you have questions comments or suggestions please contact me
David Loshin
301-754-6350
loshinknowledge-integritycom
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301) 754-6350
21
Isomorphic Domains
We can say that value domains A and B are isomorphic if
The cardinality of A is equal to the cardinality of B (they have the same number of values)
Both A and B are associated with a conceptual domain C with the same cardinality as A and B
There is an enumeration of value meanings in A that is a 1-1 mapping to concepts in C
There is an enumeration of value meanings in B that is a 1-1 mapping to concepts in C
In other words if two value domains have a one-to-one mapping to the same conceptual domain values they are isomorphic
Isomorphic domains can be harmonized
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
18
Domain Congruence
Two value domains are congruent if their value sets intersect and the size of the intersection meets or exceeds a predefined threshold
Example
FIPS 2-Character State Codes contain values for all US States
USPS 2-Character State Codes contain values for all US States as well as AA AE and AP postal codes for military delivery
Under certain circumstances the two domains can be harmonized
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
19
Encouraging a Culture of Semantic Harmony
Small variance in definitions in isolated functions become magnified when data is shared across functions
Establish a level playing ground by
Instituting a common business term glossary
Harmonizing business term definitions
Unifying shared reference data into conceptual domains and corresponding value domains
Socializing use of shared metadata
Establishing standards for future development
Integrate methods for monitoring compliance with standards
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
20
Check Out These Resources
wwwknowledge-integritycom
wwwdataqualitybookcom
If you have questions comments or suggestions please contact me
David Loshin
301-754-6350
loshinknowledge-integritycom
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301) 754-6350
21
Domain Congruence
Two value domains are congruent if their value sets intersect and the size of the intersection meets or exceeds a predefined threshold
Example
FIPS 2-Character State Codes contain values for all US States
USPS 2-Character State Codes contain values for all US States as well as AA AE and AP postal codes for military delivery
Under certain circumstances the two domains can be harmonized
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
19
Encouraging a Culture of Semantic Harmony
Small variance in definitions in isolated functions become magnified when data is shared across functions
Establish a level playing ground by
Instituting a common business term glossary
Harmonizing business term definitions
Unifying shared reference data into conceptual domains and corresponding value domains
Socializing use of shared metadata
Establishing standards for future development
Integrate methods for monitoring compliance with standards
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
20
Check Out These Resources
wwwknowledge-integritycom
wwwdataqualitybookcom
If you have questions comments or suggestions please contact me
David Loshin
301-754-6350
loshinknowledge-integritycom
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301) 754-6350
21
Encouraging a Culture of Semantic Harmony
Small variance in definitions in isolated functions become magnified when data is shared across functions
Establish a level playing ground by
Instituting a common business term glossary
Harmonizing business term definitions
Unifying shared reference data into conceptual domains and corresponding value domains
Socializing use of shared metadata
Establishing standards for future development
Integrate methods for monitoring compliance with standards
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301)754-6350
20
Check Out These Resources
wwwknowledge-integritycom
wwwdataqualitybookcom
If you have questions comments or suggestions please contact me
David Loshin
301-754-6350
loshinknowledge-integritycom
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301) 754-6350
21
Check Out These Resources
wwwknowledge-integritycom
wwwdataqualitybookcom
If you have questions comments or suggestions please contact me
David Loshin
301-754-6350
loshinknowledge-integritycom
copy 2014 Knowledge Integrity Inc wwwknowledge-integritycom
(301) 754-6350
21