the first step in information management - dama ny · data model database reality • a database is...
TRANSCRIPT
The First Step in Information Management
www.firstsanfranciscopartners.com
TheBenefitsandUsesofanEnterpriseDataModel
MalcolmChisholmPh.D.ChiefInnova=onOfficer
FirstSanFranciscoPartners
www.firstsanfranciscopartners.com
ABriefHistoryofEnterpriseDataModels
TheEvolu=onoftheDataProfession
pg 2 © 2017 First San Francisco Partners www.firstsanfranciscopartners.com Proprietary and Confidential
1970’s
1980’s
1990’s
2000+
’00-’05
First glimmerings of life – Relational database theory emerges – Early adoption of RDBMS’s – Development of theories of design (normalization) – Development of methodologies for design (Chen)
Early world – Wide adoption of RDBMS’s – Use of SQL grows – Tools available for data modeling – PC revolution – Downsizing – Promise of the Corporate Data Model
The Good Times – Huge growth in IT carries Data Administration (DA) with it – More tools, e.g. Erwin – Data warehousing begins – ERP implementations – Metadata management – But little delivery of Corporate Data Model
Mass Extinction Events – Internet bubble bursts – 9/11 – Formerly “Neanderthal” bricks-and-mortar businesses re-emerge – DA not seen as delivering – DA cutbacks
Dark Age – Very little activity – Data warehousing continues – stirrings of Master Data Management (MDM) – Stirrings of Data Governance (DG)
2005 - Golden Age of Data –Data Governance (DG) leaps to prominence – Financial Crisis
emphasizes data needs – Big Data emerges – Data at the center of business models (Google, Facebook) – Legal challenges in Data Management
ALongerTermTrend:Process-centricitytoData-centricity
pg 2 © 2017 First San Francisco Partners www.firstsanfranciscopartners.com Proprietary and Confidential
1960’s
Today
• Originally,ITsoughttoautomateprocessesthathadhithertobeenmanual
• Today,theemphasisisonunlockingvalueinherentinthedata
• BigData,DataLakesarejustmorestepsinthisevolu=on
• Buthowdoyoudescribethedata?
TheCorporateDataModel
pg 2 © 2017 First San Francisco Partners www.firstsanfranciscopartners.com Proprietary and Confidential
• TheCorporateDataModelflourishedinthelater1980’sandearly1990’s
• Theideawasthatiftherewasonedatabasedesignfortheen=reenterprise,thenallsystems(applica=ons)couldbebuiltonit
• Thebenefitwouldbeautoma=cintegra=on
TheDemiseofTheCorporateDataModel
pg 2 © 2017 First San Francisco Partners www.firstsanfranciscopartners.com Proprietary and Confidential
• TheCDMwasbasedonmainframe-erathinking–thatallapplica=onsweredevelopedin-house.
• Itwasalsoheavilyorientedtoopera=onalsystems.
• Butpackagesbecameavailableforopera=onalsystemsinnearlyeveryniche.Thesecamewithfixeddatabasedesigns
• Also,thedatamodelersdoingtheCDMcouldnotkeepupwithchangesinthebusiness–thataffectedpartsoftheCDMalready“completed”.
• DoingtheCDMtookyears.Thiswastoolongformanagementwithshorter-termhorizons.
ButThingsAreChanging
pg 2 © 2017 First San Francisco Partners www.firstsanfranciscopartners.com Proprietary and Confidential
• Intheaccelera=ngshi_todata-centricity,execu=vesarebeginningtounderstandthat:• Therearesuccessfulbusinessmodelsthatputdataatthecenterofthesebusinessmodels• Thedataintheirenterpriseisnotinastateinwhichitcanbeused• Newbusinessesmaybeabletousedatatocompetewiththeenterprise
• Soweneedenterprisedatamodels–butwhataretheysupposedtodoforus?
?Average Enterprise’s
Data Resource Highly successful data-centric
companies that everyone interacts with
WhatShouldAnEnterpriseDataModelHelpWith–1/4
pg 2 © 2017 First San Francisco Partners www.firstsanfranciscopartners.com Proprietary and Confidential
A User of Data Should Be Able To:
Know What Data The Enterprise Manages
Know What The Data Means • Including Calculations and Derivations
Know Where The Data Is Stored • At a Minimum The Authoritative Source
Know Who Is Allowed To Access The Data (Security) • If They Are Allowed to Know This
Know How to Get The Data
Know What Can Be Done with The Data (Privacy, Compliance)
Know What Decisions Have Been Made about The Data • Governance, Stewardship
WhatShouldAnEnterpriseDataModelHelpWith–2/4
pg 2 © 2017 First San Francisco Partners www.firstsanfranciscopartners.com Proprietary and Confidential
A User of Data Should Be Able To:
Know What Quality Issues Exist with The Data
Know Who Else is Interested in The Data • Stakeholder Community
Know Who to Contact if There Are Issues with The Data • Know What Processes Exist to Resolve Issues
WhatShouldAnEnterpriseDataModelHelpWith–3/4
pg 2 © 2017 First San Francisco Partners www.firstsanfranciscopartners.com Proprietary and Confidential
With This Knowledge A Data User Will Be Able To:
Use the data they rely on to perform their assigned responsibilities
Ensure that the data can always be turned into information
Participate effectively in stewardship functions that assure the quality, privacy, security, and meeting the compliance requirements of the data
WhatShouldAnEnterpriseDataModelHelpWith–4/4
pg 2 © 2017 First San Francisco Partners www.firstsanfranciscopartners.com Proprietary and Confidential
The Enterprise as A Whole Will Benefit as Data Users Also:
Use the data to increase the efficiency of the enterprise’s operations, including re-engineering of business processes to take advantage of improved data understandability, availability, and quality.
Use the data to meet the enterprise’s business goals, including adaptation to changing market, regulatory, and other environments; And also agility in responding to new opportunities.
Use the data to mitigate risk in the enterprise, including reduction of operational risk inherent in the data itself as data quality improves.
www.firstsanfranciscopartners.com
Seman=csandTheEnterpriseDataModel
Defini=onsof“Seman=cs”
pg 2 © 2017 First San Francisco Partners www.firstsanfranciscopartners.com Proprietary and Confidential
…IDEF1X is a semantic data modeling technique. It is used to produce a graphical information model which represents the structure and semantics of information within an environment or system. Use of this standard permits the construction of semantic data models which may serve to support the management of data as a resource, the integration of information systems, and the building of computer databases.
The doctrine of historical word-meanings…
• “Seman=cs”hastradi=onallymeantthemeaningofwordsinhumanlanguage,butthisisnotgoodenoughfordata
ButNotAllSeman=csareinTradi=onalDataModels
pg 2 © 2017 First San Francisco Partners www.firstsanfranciscopartners.com Proprietary and Confidential
Acme Widget Co Inc
Global Widgets Inc
Mega Investors LLC
Is Majority Owner of
Is a Subsidiary of
Organization TableOrg ID Org Name Related
Org IDRelation
Type Code
111Acme
Widget Code Inc
222 SU
222 Global Widgets Inc 333 MO
333Mega
Investors LLC
Org Relation Type TableCode Description
MO Is Majority Owner of
SU Is a Subsidiary of
Data Model Database Reality• A database is one level of
abstraction removed from reality, and a data model is two levels removed.
• A data model aims to optimize data storage, e.g. insulate data structures from business change
• Therefore, not all business semantics are captured in a data model.
TheScopeofSeman=cs
pg 2 © 2017 First San Francisco Partners www.firstsanfranciscopartners.com Proprietary and Confidential
Terms / Concepts
Hierarchies
Relationships
The concepts used in the business and the terms used to identify them – and their definitions
How individual things (instances) are associated at multiple levels for specific business needs
Other business relationships outside taxonomies and hierarchies
OntologiesEach is a “view” of the business world (and business
information) that is required to meet a specific business need
Taxonomies The relationships of general with specific concepts
Business Rules Atomic units of logic that govern behavior of concepts and relationships
• Seman&cs:theunderstandingofinforma=onwithoutanyconcernabouthowitmaybestoredasdata
• Onthisviewtherearemanyseman=cmodels,ratherthanoneperenterprise
www.firstsanfranciscopartners.com
SubjectAreaModels
SubjectAreaModel
pg 2 © 2017 First San Francisco Partners www.firstsanfranciscopartners.com Proprietary and Confidential
• Afundamentalenterprisedatamodelshowingmajorareasofenterprisedataconcerns
• Withineachareathereisterminologicalconsistency
• Aidsinanaly=cs,e.g.DataMartsaresubject-oriented
• BUT:Therecanbemanyofthese,e.g.fortheenterpriseasawhole,formetadata(shownhere),fortheworldoutsidetheenterprise(referencedata)–andatlowerlevelstoo
ExampleofMetadataSubjectAreaModel
pg 2 © 2017 First San Francisco Partners www.firstsanfranciscopartners.com Proprietary and Confidential
Advantages1. Standardizes the metadata in the enterprise at a high level.
2. Can be used to prioritize projects and programmes.
3. Useful in communications.4. Shows what metadata is being produced by subject area. This is a high-level
inventory.
Disadvantages1. It is a taxonomy and can be argued over. No one taxonomy will satisfy every
perspective. We use data categorization for that (see later). The Subject Area Model should be the most common and natural perspective.
2. The Subject Area Model often has more things expected of it than it can deliver. Managing expectations can be tricky.
5. Can distinguish data-related metadata from other data6. Within each subject area, definitions should be constant. There may be
variations across subject areas.7. Subject areas are candidates for conceptual models.
TaketheSubjectAreaModeltoLowerLevelsofDetail
pg 2 © 2017 First San Francisco Partners www.firstsanfranciscopartners.com Proprietary and Confidential
• Takeitasfarasyouwant,butitiss=llhighlevel• Stopwhenyougettothelevelofmajoren==es
• Terminologybecomesaproblem
WhoOwnsTheSubjectAreaModels?
pg 2 © 2017 First San Francisco Partners www.firstsanfranciscopartners.com Proprietary and Confidential
• SubjectAreaModelsareproducedbyamodelingexercise,sostaffwithmodelingexper=seneedtobeinvolvedindoingthem
• SubjectAreaModelsareimportantar=factsthathavealotofimpact,soDataGovernanceandDataArchitectureneedtobeinvolved
www.firstsanfranciscopartners.com
Taxonomies
WhatisaTaxonomy?
pg 2 © 2017 First San Francisco Partners www.firstsanfranciscopartners.com Proprietary and Confidential
Greek root = “Orderly Arrangement” + “A Law” “The laws and principles of the classifying of natural objects; that department of science which treats of classification” Baldwin’s Dictionary of Philosophy Taxonomies must involve some element
of hierarchy. Often it is only at two levels
• AlowerlevelmodelthanaSubjectAreaModel
• O_enimplementedasreferencedata
• Appearssimple,butisit?
TaxonomiesAreActuallyMoreComplexModels
pg 2 © 2017 First San Francisco Partners www.firstsanfranciscopartners.com Proprietary and Confidential
• Classifica=onrulesareo_enneeded
• Thus,theseenterprisemodelsarenotsimplydiagrams
• Usagerulesmaybeneededtoo–peoplecanmisusetaxonomies
Auto Risk
Medium Risk
Low Risk
High Risk
Classification Rules:If Credit Score < 620 = High RiskIf Credit Score >= 620 & Credit Score < 680 = Medium RiskIf Credit Score >= 680 = Low Risk
Applicants
Usage Rules:E.g. how to price the 3 different classes of risk
TaxonomiesAreActuallyMoreComplexModels
pg 2 © 2017 First San Francisco Partners www.firstsanfranciscopartners.com Proprietary and Confidential
Customer Table
Customer ID
Customer First Name
Customer Last Name
Customer Type
123456 ACME Inc CORPORATE
334455 John Smith INDIVIDUAL
765432 Amy Chen INDIVIDUAL
Social Security Number
111-22-3333
22-33-4444
Employer ID Number
11-5556666
Column only for
INDIVIDUAL
Column only for
CORPORATE
• Average Conversion Cost per Customer will likely be much higher for CORPORATE than INDIVIDUAL
• Marketing campaigns will likely be quite different for CORPRATE and INDIVIDUAL and there will be a need to distinguish them
• Taxonomiesareneededtoclassifydata
• Wearedealingwithdifferentsetsofrecords,sothesetaxonomiescannotbeinaDataDic=onary
• Thatis,therela=onbetweenacolumnandtheType(inthiscase).
www.firstsanfranciscopartners.com
ConceptualModels
ExampleofConceptualModel
pg 2 © 2017 First San Francisco Partners www.firstsanfranciscopartners.com Proprietary and Confidential
PreferredTerm
BusinessTerm
Abbreviation
Acronym
FullTerm
BusinessConcept
CommonTerm
DefinitionShortDefinition
LongDefinition
Terminology
Semantics
Calculation
Examples
OfficialTerm
BusinessConcept
mustbeusedincertain
circumstances
Isusedmorecommonlythanotherterms
uses
BusinessUser
ishomonymof
signifies signifies
DescriptiveTerm
BusinessTermInternalOfficial
Term
ExternalOfficialTerm
usedinternally
iscomposedofwordsthatprovideanunderstanding
iscomposedofinitialletters
ExternalOrganization
requires
canbea
consistsof(inpart)
iscomposedofcompletewords
isnotcomposedofcompletewords
Definition
consistsof(inpart)
isdoublette of
consistsof(inpart)
usedexternally
SubstitutionaryPhrase
iseveryday
isbusinessspecific
Term
issynonymof
• Asdetailedaspossible
• BUT–enterprise-wide
• Some=mesthoughtofas“views”
• Nostandardizedwayofdoingthem
ExampleofConceptualModel
pg 2 © 2017 First San Francisco Partners www.firstsanfranciscopartners.com Proprietary and Confidential
• Thereisnostandardizedwayofdoingthesediagrams(intermsofdiagramnota=on)
• Theremayneverbe
• Sotrytomakethemasvisuallyunderstandableaspossible
SemanticIntegrity tprev
tcurrAccuracy
Timeliness
KnownUniverse
UnknownUniverse
Universe
Coverage
ClientID FirstName LastName DOB
Code
023 Smith 1965-02-01
NJ
Record
CompletenessofValues
RedundancyFirstName
12/08/1992
08/11/1987
Nationality
Completeness ofDataElements
InterestRate
9.4239573
Precision
Dateyyyy/mm/dd
12/31/2012
Conformity
State
NJ
Name
NewJersey
ReferentialIntegrity
Consistency(Record/Value
Level)
AddressLine3
Suite304
Fax:555-555-1212
Apt.713
Overloading
DataElement
Customer
DefinitionSomethingofinteresttous
WrittenPremium
DataDefinition(Missing/Unclear)
SemanticEquivalence
Existence
Defini=ons
pg 2 © 2017 First San Francisco Partners www.firstsanfranciscopartners.com Proprietary and Confidential
• Notjustadiagram
• Alltermsmustbedefined
• Allrela=onshipsmustbeexplained
• Youcanaggregateintoadic=onaryformatacrossallterms,soyoucanuseatradi=onalmetadatarepository
• Thisisnecessarytoconformtermsanddefini=onswheretheyoccurindifferentmodels,whichisaneedofthisapproach
• And,againthiswillmakeitenterprisewide
www.firstsanfranciscopartners.com
Conclusions
Conclusions:1/2
pg 2 © 2017 First San Francisco Partners www.firstsanfranciscopartners.com Proprietary and Confidential
• TheCorporateDataModel(CDM)wasadesignforallsystemsinanenterprise.Itfailedbecause:- Packagedso_waremadeitirrelevant- Itwasorientedtoopera=onalsystems- Itcouldnotbecompletedasthebusinesschanged
• But,today,intheGoldenAgeofData,thereisatremendousdemandtounderstanddataandenterprise“datamodels”cansa=sfythat-Butweneedtobeclearwhattheyaretryingtodo(hint:notbedatabasedesigns)
• Notallneededseman=cscanbeaccommodatedintradi=onaldatamodels-Soweneednewer,beherapproaches
Conclusions:2/2
pg 2 © 2017 First San Francisco Partners www.firstsanfranciscopartners.com Proprietary and Confidential
• Atahighlevel,SubjectModelsareessen=al,andclearlyenterprise-wide- Therecanbemanyperenterprise,whichisnotapopularidea- Theyneedtobedecomposedtolowerlevelsofdetailthanisnormallyaccepted,butnotbelow
thelevelofmajoren=tytypes
• Taxonomiesareessen=al,andthoughtheyareverylowlevel,theyshouldbeenterprise-wide-Buttheyareo_enthoughtofas“justreferencedata”withouttherulestheyneed
• Complex,butdiscrete,conceptual(informa=on)modelsarehighlyusefulandcanbeenterprise-wide- TheyareneededforBigData,DataLakes- Theyareneededfortheabstrac=onlayerbetweendatastoresandbusinessinforma=on- Buttheyarenewandmayneverhaveacommonnota=on- Yettheyaretheexci=ngnewfron=erfordatamodeling
Thankyou!MalcolmChisholmPh.D.