defining a person metadata model to improve data · pdf filedefining a person metadata model...
TRANSCRIPT
Defining a Person Metadata Model to Improve Data Quality8th February 2011
Ian Woodrow
Atos, Atos and fish symbol, Atos Origin and fish symbol, Atos Consulting, and the fish itself are registered trademarks of Atos Origin SA. August 2006© 2006 Atos Origin. Confidential information owned by Atos Origin, to be used by the recipient only. This document or any part of it, may not be reproduced, copied, circulated and/or distributed nor quoted without prior written approval from Atos Origin.
Make things as simple as possible, but no simpler
Ei t i
Essentially, all models are wrong, but some are useful
Einstein
y g
Box & Draper
If you can’t measure it you can’t manage itIf you can’t measure it, you can’t manage it…Various known
2 Defining a Person Metadata Model to Improve Data Quality
Your PresenterYour Presenter
R l S i /P j t M d I f ti A l t
Projects
Role – Service/Project Manager and Information Analyst
EmploymentProjects
» Data Migration and Cleansing
» Data Standards Implementations
» National Audit Office (E&AD)
» Capgemini
F l» Sales and Account Development
» Business and Data Analysts
A l A ti
» Freelance
» Atos Origin
» Accruals Accounting
Recent Training
» TOGAF9 Enterprise Data Architect
P i 2 P titi» Prince2 Practitioner
» Value Analysis.
3 Defining a Person Metadata Model to Improve Data Quality
IntroductionIntroduction
» Information Management Service
» Corporate Data Model
» Person Metadata Model» Person Metadata Model
» Data Standards
» Data Profile Reporting
» Questions and Queries» Questions and Queries
» Contact Me.
4 Defining a Person Metadata Model to Improve Data Quality Optional chapter number (Arial 10 plain)
Information Management ServiceInformation Management Service
5 Defining a Person Metadata Model to Improve Data Quality
Corporate Data ModelCorporate Data Model
PEOPLE/ORGANISATION
DOCUMENTSORGANISATION
COMPANY
SERVICESEVENTS
LOCATIONS
6 Defining a Person Metadata Model to Improve Data Quality Optional chapter number (Arial 10 plain)
ApplicationsApplications
R i t» Registry
» Membership
B fit» Benefit
» Biometrics
» Resources» Resources
» Overseas
7 Defining a Person Metadata Model to Improve Data Quality
Data StandardsData Standards
S l t T t Att ib tSelect Target Attribute
» Identify Local Application Alternatives» Look for Commonality» Look for Commonality» Review Government Standards» Review UK National Standards» Review Other Government Standards (e.g. NIST)( g )» Review International Standards» Assess Applicability» Decide Solution» Confirm Solution» Publish Solution
T t Att ib t M t d tTarget Attribute Metadata
» Datatype» Length » Description
8 Defining a Person Metadata Model to Improve Data Quality
» Length» Values Lists » Constraints
» Defaults
Tooling: Best of Breed SolutionTooling: Best of Breed Solution
PRODUCTS
INTRANETCHANNELS
Erwin Data ModellerModel Manager
INTRANETTECH DOC LIBRARY
Model ManagerProcess Modeller (BPWin)
Discovery
9 Defining a Person Metadata Model to Improve Data Quality
CDM Person Subject AreaCDM Person Subject Area
10 Defining a Person Metadata Model to Improve Data Quality
Cross-reference CDM to ApplicationsCross-reference CDM to Applications
ATTRIBUTE CDM REGISTRY MEMBERSHIPATTRIBUTE CDM REGISTRY MEMBERSHIPFull NameFamily Name PERSON.FAMILY NAME REGISTER.PRINCIPAL NAME PEOPLE.SURNAMEGiven NamesDate of BirthGenderNationality(s)Language (s)Alternative DetailsOrganisation Start DateOrganisation Start DateOrganisation ReferenceInternational ID (Passport)NI NumberFirst Line AddressPostcodeTelephone Number(s)
11 Defining a Person Metadata Model to Improve Data Quality
Publishing the ResultsPublishing the Results
12 Defining a Person Metadata Model to Improve Data Quality
Person Metadata Model VisionPerson Metadata Model – Vision
» To Create a Whole Customer View» Person Identity» Key Events» Key Events» Sufficient for Identity Resolution
» Organisation Facts » Technical Factsg» Siloed Data Sources» Several Suppliers» Considering:
» Technical Facts» Different Databases
-SQL ServerOracle» Considering:
» Master Data Management» Service Oriented Architecture
-Oracle-Access
» No ETL Tool on server
13 Defining a Person Metadata Model to Improve Data Quality
Benefits of Person Metadata Model ApproachBenefits of Person Metadata Model Approach
» Address data shortfall in existing applications (enrichment)
F b i f li ti d t d l» Form basis for new application data models
» Bench mark internal developments and suppliers offerings
Baseline for Profiling» Baseline for Profiling
14 Defining a Person Metadata Model to Improve Data Quality Optional chapter number (Arial 10 plain)
Person Metadata ModelPerson Metadata Model
15 Defining a Person Metadata Model to Improve Data Quality Optional chapter number (Arial 10 plain)
Person Baseline AnalysisPerson Baseline Analysis
ApplicationsApplications (21 existing + new)
16 Defining a Person Metadata Model to Improve Data Quality
CDM Attribute ScoringCDM Attribute Scoring
17 Defining a Person Metadata Model to Improve Data Quality Optional chapter number (Arial 10 plain)
Application Data Model to CDM AnalysisApplication Data Model to CDM Analysis
18 Defining a Person Metadata Model to Improve Data Quality Optional chapter number (Arial 10 plain)
Data Standards - ApproachData Standards - Approach
S l t T t Att ib tSelect Target Attribute
» Identify Local Application Alternatives» Look for Commonality» Look for Commonality» Review Government Standards» Review UK National Standards» Review Other Government Standards (e.g. NIST)( g )» Review International Standards» Assess Applicability» Decide Solution» Confirm Solution» Publish Solution
19 Defining a Person Metadata Model to Improve Data Quality
Data Standards Observations and IssuesData Standards – Observations and Issues
» Local Standards» Name
- Two Fields (Given Names and Family Name)Si l C i Fi ld- Single Composite Fields
» Problems- Which is the Family Name (single field option)- Which is the Family Name (single field option)- Cultures with no Family Name- Length, Which Character set to Useg ,- Using National Standards to International Situation
» GenderSi l Ch t tt i ll (b t l t t)- Single Character pretty universally (but also text)
- Which standard to apply (M/F or H/D, not known, not disclosed not specified)
20 Defining a Person Metadata Model to Improve Data Quality
disclosed, not specified)
Optional chapter number (Arial 10 plain)
Data Standards: Given Names ResolutionData Standards: Given Names Resolution
» eGIF
» Originally CDM mandated to be compliant with eGIF» Too short» Too short
» Application Standard» Acknowledged as almost long enough!» Acknowledged as almost long enough!
» New Standard for length 100 characters prevails
» Promoted the publication on CDM website/Release Note
21 Defining a Person Metadata Model to Improve Data Quality Optional chapter number (Arial 10 plain)
Data Standards: Gender ResolutionData Standards: Gender Resolution
» Application examplespp p» Values: M/F, M/F/U, M/F/D
» eGIF:» eGIF:» 4 values 0,1(Male), 2(Female), 9
» Use of ISO/IEC 5218:2004 standard enables international» Use of ISO/IEC 5218:2004 standard, enables international interchange
Standard for 4 UK values with mappings» Standard for 4 UK values with mappings
» Promoted the publication on CDM website/Release Note
22 Defining a Person Metadata Model to Improve Data Quality Optional chapter number (Arial 10 plain)
Data Profile ReportingData Profile Reporting
» Standard Data Profiling- Field Type Distribution- Field Uniqueness
Relationship Integrity- Relationship Integrity» CDM Profiling
- Person BaselinePerson Baseline- Event Baseline
» Issues- Root Cause Analysis- People/Process/Technology Approach
23 Defining a Person Metadata Model to Improve Data Quality Optional chapter number (Arial 10 plain)
Field Type Distribution ReportField Type Distribution Report
Presents the percentage of rows within a particular field that fall into to any of the following field type categories:to any of the following field type categories:
» Null
» Integer
St i» String
» Decimal
Space» Space
24 Defining a Person Metadata Model to Improve Data Quality
Field Uniqueness ReportField Uniqueness Report
Presents the percentage uniqueness of a column
» Null Percentageg
» All Distinct e.g. ID fields
» Reference Data (few distinct values)
» Form a view of adequate uniquenessuniqueness
25 Defining a Person Metadata Model to Improve Data Quality
Relationship Integrity ReportRelationship Integrity Report
Presents the widows and orphans by number/percentage
TABLE 1.FIELD 1 and TABLE 2.FIELD 2Join AnalysisJoin Analysis
valid as of xx/xx/xxxx
3016 1247 1247 1769
131200902522782
15742872
10000100000
100000010000000
100000000
1247 1247 1769
15
110
1001000
10000
1.FI
ELD
1Lo
aded
ng T
ABLE
1 R
ows
ng T
ABLE
Val
ues
TABL
End
TAB
LE2
Valu
es
TABL
End
TAB
LE2
Row
s
ng T
ABLE
2 Va
lues
ng T
ABLE
2 R
ows
2.FI
ELD
2Lo
aded
TABL
E R
ows
Non
-Mat
chi n
1.FI
ELD
1
Non
-Mat
chin
1.FI
ELD
1
Mat
chin
g 1.
FIEL
D 1
a2.
FIEL
D 2
Mat
chin
g 1.
FIEL
D 1
a2.
FIEL
D 2
Non
-Mat
chin
2.FI
ELD
2
Non
-Mat
chin
2.FI
ELD
2
TABL
E 2
Row
s 26 Defining a Person Metadata Model to Improve Data Quality
Profiling Examples: Given NamesProfiling Examples: Given Names
» Presence – 101 Null
» Unique Values – 99 999 N/A» Unique Values 99,999 N/A
» Patterns – 4,000» Leading Spaces – 100» Contains Digits – 50» Initials Only – 25» Trailing Spaces – 50» Leading Punctuation – 100» Non-printable Characters – 10p
» Minimum Value – 0
» Maximum Value – Â…
27 Defining a Person Metadata Model to Improve Data Quality
Profiling Examples: GenderProfiling Examples: Gender
» Presence – 25,000 Null
» Unique Values – 25 values» Unique Values 25 values
» Patterns – 8» Leading Spaces – 1» Contains Digits – N/Ag» Initials Only – 2» Trailing Spaces – 3» Leading Punctuation – 4» Non-printable Characters – 5p
» Minimum Value – male
» Maximum Value – vnm
28 Defining a Person Metadata Model to Improve Data Quality
Gender Unique ValuesGender – Unique Values
29 Defining a Person Metadata Model to Improve Data Quality
CDM Profiling Report and PresentationCDM Profiling – Report and Presentation
» Report» Executive Summary
» Presentation
IntroductionExecutive Summary» Observations
- People - Process
» Introduction
» Process Overview
» Observations- Technology
» Profile Scope» How to read the Document
» Observations
» Data Facts» Standard Reports» Person Baseline Analysis
» Business Drivers» Profile Objective» Profile Approach
» Person Baseline Analysis
» Findings
» Recommendations» Assumptions» Observation Details» Next Steps
» Next Steps
30 Defining a Person Metadata Model to Improve Data Quality
Addressing the IssuesAddressing the Issues
» Root Cause Analysis » Reporting Approach» Fact Based» To preclude problem
recurring
» People» Process» Technologyrecurring » Technology
» ExamineCode SpecificationsUI and UI Standards Use of dropdownsField Validation Talk with UsersField Validation Talk with Users
» General BA Tools.
31 Defining a Person Metadata Model to Improve Data Quality
» General BA Tools.
For more information please contact:For more information please contact:
Ian Woodrow
m +44 (0)7974 674045 [email protected]
Atos Origin (UK)4 Triton Square
NW1 3HG, London
Atos, Atos and fish symbol, Atos Origin and fish symbol, Atos Consulting, and the fish itself are registered trademarks of Atos Origin SA. August 2006© 2006 Atos Origin. Confidential information owned by Atos Origin, to be used by the recipient only. This document or any part of it, may not be reproduced, copied, circulated and/or distributed nor quoted without prior written approval from Atos Origin.
www.atosorigin.com