the first step in information management - dama ny · data model database reality • a database is...

32
The First Step in Information Management www.firstsanfranciscopartners.com The Benefits and Uses of an Enterprise Data Model Malcolm Chisholm Ph.D. Chief Innova=on Officer First San Francisco Partners

Upload: others

Post on 11-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The First Step in Information Management - DAMA NY · Data Model Database Reality • A database is one level of abstraction removed from reality, and a data model is two levels removed

The First Step in Information Management

www.firstsanfranciscopartners.com

TheBenefitsandUsesofanEnterpriseDataModel

MalcolmChisholmPh.D.ChiefInnova=onOfficer

FirstSanFranciscoPartners

Page 2: The First Step in Information Management - DAMA NY · Data Model Database Reality • A database is one level of abstraction removed from reality, and a data model is two levels removed

www.firstsanfranciscopartners.com

ABriefHistoryofEnterpriseDataModels

Page 3: The First Step in Information Management - DAMA NY · Data Model Database Reality • A database is one level of abstraction removed from reality, and a data model is two levels removed

TheEvolu=onoftheDataProfession

pg 2 © 2017 First San Francisco Partners www.firstsanfranciscopartners.com Proprietary and Confidential

1970’s

1980’s

1990’s

2000+

’00-’05

First glimmerings of life – Relational database theory emerges – Early adoption of RDBMS’s – Development of theories of design (normalization) – Development of methodologies for design (Chen)

Early world – Wide adoption of RDBMS’s – Use of SQL grows – Tools available for data modeling – PC revolution – Downsizing – Promise of the Corporate Data Model

The Good Times – Huge growth in IT carries Data Administration (DA) with it – More tools, e.g. Erwin – Data warehousing begins – ERP implementations – Metadata management – But little delivery of Corporate Data Model

Mass Extinction Events – Internet bubble bursts – 9/11 – Formerly “Neanderthal” bricks-and-mortar businesses re-emerge – DA not seen as delivering – DA cutbacks

Dark Age – Very little activity – Data warehousing continues – stirrings of Master Data Management (MDM) – Stirrings of Data Governance (DG)

2005 - Golden Age of Data –Data Governance (DG) leaps to prominence – Financial Crisis

emphasizes data needs – Big Data emerges – Data at the center of business models (Google, Facebook) – Legal challenges in Data Management

Page 4: The First Step in Information Management - DAMA NY · Data Model Database Reality • A database is one level of abstraction removed from reality, and a data model is two levels removed

ALongerTermTrend:Process-centricitytoData-centricity

pg 2 © 2017 First San Francisco Partners www.firstsanfranciscopartners.com Proprietary and Confidential

1960’s

Today

•  Originally,ITsoughttoautomateprocessesthathadhithertobeenmanual

•  Today,theemphasisisonunlockingvalueinherentinthedata

•  BigData,DataLakesarejustmorestepsinthisevolu=on

•  Buthowdoyoudescribethedata?

Page 5: The First Step in Information Management - DAMA NY · Data Model Database Reality • A database is one level of abstraction removed from reality, and a data model is two levels removed

TheCorporateDataModel

pg 2 © 2017 First San Francisco Partners www.firstsanfranciscopartners.com Proprietary and Confidential

•  TheCorporateDataModelflourishedinthelater1980’sandearly1990’s

•  Theideawasthatiftherewasonedatabasedesignfortheen=reenterprise,thenallsystems(applica=ons)couldbebuiltonit

•  Thebenefitwouldbeautoma=cintegra=on

Page 6: The First Step in Information Management - DAMA NY · Data Model Database Reality • A database is one level of abstraction removed from reality, and a data model is two levels removed

TheDemiseofTheCorporateDataModel

pg 2 © 2017 First San Francisco Partners www.firstsanfranciscopartners.com Proprietary and Confidential

•  TheCDMwasbasedonmainframe-erathinking–thatallapplica=onsweredevelopedin-house.

•  Itwasalsoheavilyorientedtoopera=onalsystems.

•  Butpackagesbecameavailableforopera=onalsystemsinnearlyeveryniche.Thesecamewithfixeddatabasedesigns

•  Also,thedatamodelersdoingtheCDMcouldnotkeepupwithchangesinthebusiness–thataffectedpartsoftheCDMalready“completed”.

•  DoingtheCDMtookyears.Thiswastoolongformanagementwithshorter-termhorizons.

Page 7: The First Step in Information Management - DAMA NY · Data Model Database Reality • A database is one level of abstraction removed from reality, and a data model is two levels removed

ButThingsAreChanging

pg 2 © 2017 First San Francisco Partners www.firstsanfranciscopartners.com Proprietary and Confidential

•  Intheaccelera=ngshi_todata-centricity,execu=vesarebeginningtounderstandthat:•  Therearesuccessfulbusinessmodelsthatputdataatthecenterofthesebusinessmodels•  Thedataintheirenterpriseisnotinastateinwhichitcanbeused•  Newbusinessesmaybeabletousedatatocompetewiththeenterprise

•  Soweneedenterprisedatamodels–butwhataretheysupposedtodoforus?

?Average Enterprise’s

Data Resource Highly successful data-centric

companies that everyone interacts with

Page 8: The First Step in Information Management - DAMA NY · Data Model Database Reality • A database is one level of abstraction removed from reality, and a data model is two levels removed

WhatShouldAnEnterpriseDataModelHelpWith–1/4

pg 2 © 2017 First San Francisco Partners www.firstsanfranciscopartners.com Proprietary and Confidential

A User of Data Should Be Able To:

Know What Data The Enterprise Manages

Know What The Data Means •  Including Calculations and Derivations

Know Where The Data Is Stored •  At a Minimum The Authoritative Source

Know Who Is Allowed To Access The Data (Security) •  If They Are Allowed to Know This

Know How to Get The Data

Know What Can Be Done with The Data (Privacy, Compliance)

Know What Decisions Have Been Made about The Data •  Governance, Stewardship

Page 9: The First Step in Information Management - DAMA NY · Data Model Database Reality • A database is one level of abstraction removed from reality, and a data model is two levels removed

WhatShouldAnEnterpriseDataModelHelpWith–2/4

pg 2 © 2017 First San Francisco Partners www.firstsanfranciscopartners.com Proprietary and Confidential

A User of Data Should Be Able To:

Know What Quality Issues Exist with The Data

Know Who Else is Interested in The Data •  Stakeholder Community

Know Who to Contact if There Are Issues with The Data •  Know What Processes Exist to Resolve Issues

Page 10: The First Step in Information Management - DAMA NY · Data Model Database Reality • A database is one level of abstraction removed from reality, and a data model is two levels removed

WhatShouldAnEnterpriseDataModelHelpWith–3/4

pg 2 © 2017 First San Francisco Partners www.firstsanfranciscopartners.com Proprietary and Confidential

With This Knowledge A Data User Will Be Able To:

Use the data they rely on to perform their assigned responsibilities

Ensure that the data can always be turned into information

Participate effectively in stewardship functions that assure the quality, privacy, security, and meeting the compliance requirements of the data

Page 11: The First Step in Information Management - DAMA NY · Data Model Database Reality • A database is one level of abstraction removed from reality, and a data model is two levels removed

WhatShouldAnEnterpriseDataModelHelpWith–4/4

pg 2 © 2017 First San Francisco Partners www.firstsanfranciscopartners.com Proprietary and Confidential

The Enterprise as A Whole Will Benefit as Data Users Also:

Use the data to increase the efficiency of the enterprise’s operations, including re-engineering of business processes to take advantage of improved data understandability, availability, and quality.

Use the data to meet the enterprise’s business goals, including adaptation to changing market, regulatory, and other environments; And also agility in responding to new opportunities.

Use the data to mitigate risk in the enterprise, including reduction of operational risk inherent in the data itself as data quality improves.

Page 12: The First Step in Information Management - DAMA NY · Data Model Database Reality • A database is one level of abstraction removed from reality, and a data model is two levels removed

www.firstsanfranciscopartners.com

Seman=csandTheEnterpriseDataModel

Page 13: The First Step in Information Management - DAMA NY · Data Model Database Reality • A database is one level of abstraction removed from reality, and a data model is two levels removed

Defini=onsof“Seman=cs”

pg 2 © 2017 First San Francisco Partners www.firstsanfranciscopartners.com Proprietary and Confidential

…IDEF1X is a semantic data modeling technique. It is used to produce a graphical information model which represents the structure and semantics of information within an environment or system. Use of this standard permits the construction of semantic data models which may serve to support the management of data as a resource, the integration of information systems, and the building of computer databases.

The doctrine of historical word-meanings…

•  “Seman=cs”hastradi=onallymeantthemeaningofwordsinhumanlanguage,butthisisnotgoodenoughfordata

Page 14: The First Step in Information Management - DAMA NY · Data Model Database Reality • A database is one level of abstraction removed from reality, and a data model is two levels removed

ButNotAllSeman=csareinTradi=onalDataModels

pg 2 © 2017 First San Francisco Partners www.firstsanfranciscopartners.com Proprietary and Confidential

Acme Widget Co Inc

Global Widgets Inc

Mega Investors LLC

Is Majority Owner of

Is a Subsidiary of

Organization TableOrg ID Org Name Related

Org IDRelation

Type Code

111Acme

Widget Code Inc

222 SU

222 Global Widgets Inc 333 MO

333Mega

Investors LLC

Org Relation Type TableCode Description

MO Is Majority Owner of

SU Is a Subsidiary of

Data Model Database Reality•  A database is one level of

abstraction removed from reality, and a data model is two levels removed.

•  A data model aims to optimize data storage, e.g. insulate data structures from business change

•  Therefore, not all business semantics are captured in a data model.

Page 15: The First Step in Information Management - DAMA NY · Data Model Database Reality • A database is one level of abstraction removed from reality, and a data model is two levels removed

TheScopeofSeman=cs

pg 2 © 2017 First San Francisco Partners www.firstsanfranciscopartners.com Proprietary and Confidential

Terms / Concepts

Hierarchies

Relationships

The concepts used in the business and the terms used to identify them – and their definitions

How individual things (instances) are associated at multiple levels for specific business needs

Other business relationships outside taxonomies and hierarchies

OntologiesEach is a “view” of the business world (and business

information) that is required to meet a specific business need

Taxonomies The relationships of general with specific concepts

Business Rules Atomic units of logic that govern behavior of concepts and relationships

•  Seman&cs:theunderstandingofinforma=onwithoutanyconcernabouthowitmaybestoredasdata

•  Onthisviewtherearemanyseman=cmodels,ratherthanoneperenterprise

Page 16: The First Step in Information Management - DAMA NY · Data Model Database Reality • A database is one level of abstraction removed from reality, and a data model is two levels removed

www.firstsanfranciscopartners.com

SubjectAreaModels

Page 17: The First Step in Information Management - DAMA NY · Data Model Database Reality • A database is one level of abstraction removed from reality, and a data model is two levels removed

SubjectAreaModel

pg 2 © 2017 First San Francisco Partners www.firstsanfranciscopartners.com Proprietary and Confidential

•  Afundamentalenterprisedatamodelshowingmajorareasofenterprisedataconcerns

•  Withineachareathereisterminologicalconsistency

•  Aidsinanaly=cs,e.g.DataMartsaresubject-oriented

•  BUT:Therecanbemanyofthese,e.g.fortheenterpriseasawhole,formetadata(shownhere),fortheworldoutsidetheenterprise(referencedata)–andatlowerlevelstoo

Page 18: The First Step in Information Management - DAMA NY · Data Model Database Reality • A database is one level of abstraction removed from reality, and a data model is two levels removed

ExampleofMetadataSubjectAreaModel

pg 2 © 2017 First San Francisco Partners www.firstsanfranciscopartners.com Proprietary and Confidential

Advantages1. Standardizes the metadata in the enterprise at a high level.

2. Can be used to prioritize projects and programmes.

3. Useful in communications.4. Shows what metadata is being produced by subject area. This is a high-level

inventory.

Disadvantages1. It is a taxonomy and can be argued over. No one taxonomy will satisfy every

perspective. We use data categorization for that (see later). The Subject Area Model should be the most common and natural perspective.

2. The Subject Area Model often has more things expected of it than it can deliver. Managing expectations can be tricky.

5. Can distinguish data-related metadata from other data6. Within each subject area, definitions should be constant. There may be

variations across subject areas.7. Subject areas are candidates for conceptual models.

Page 19: The First Step in Information Management - DAMA NY · Data Model Database Reality • A database is one level of abstraction removed from reality, and a data model is two levels removed

TaketheSubjectAreaModeltoLowerLevelsofDetail

pg 2 © 2017 First San Francisco Partners www.firstsanfranciscopartners.com Proprietary and Confidential

•  Takeitasfarasyouwant,butitiss=llhighlevel•  Stopwhenyougettothelevelofmajoren==es

•  Terminologybecomesaproblem

Page 20: The First Step in Information Management - DAMA NY · Data Model Database Reality • A database is one level of abstraction removed from reality, and a data model is two levels removed

WhoOwnsTheSubjectAreaModels?

pg 2 © 2017 First San Francisco Partners www.firstsanfranciscopartners.com Proprietary and Confidential

•  SubjectAreaModelsareproducedbyamodelingexercise,sostaffwithmodelingexper=seneedtobeinvolvedindoingthem

•  SubjectAreaModelsareimportantar=factsthathavealotofimpact,soDataGovernanceandDataArchitectureneedtobeinvolved

Page 21: The First Step in Information Management - DAMA NY · Data Model Database Reality • A database is one level of abstraction removed from reality, and a data model is two levels removed

www.firstsanfranciscopartners.com

Taxonomies

Page 22: The First Step in Information Management - DAMA NY · Data Model Database Reality • A database is one level of abstraction removed from reality, and a data model is two levels removed

WhatisaTaxonomy?

pg 2 © 2017 First San Francisco Partners www.firstsanfranciscopartners.com Proprietary and Confidential

Greek root = “Orderly Arrangement” + “A Law” “The laws and principles of the classifying of natural objects; that department of science which treats of classification” Baldwin’s Dictionary of Philosophy Taxonomies must involve some element

of hierarchy. Often it is only at two levels

•  AlowerlevelmodelthanaSubjectAreaModel

•  O_enimplementedasreferencedata

•  Appearssimple,butisit?

Page 23: The First Step in Information Management - DAMA NY · Data Model Database Reality • A database is one level of abstraction removed from reality, and a data model is two levels removed

TaxonomiesAreActuallyMoreComplexModels

pg 2 © 2017 First San Francisco Partners www.firstsanfranciscopartners.com Proprietary and Confidential

•  Classifica=onrulesareo_enneeded

•  Thus,theseenterprisemodelsarenotsimplydiagrams

•  Usagerulesmaybeneededtoo–peoplecanmisusetaxonomies

Auto Risk

Medium Risk

Low Risk

High Risk

Classification Rules:If Credit Score < 620 = High RiskIf Credit Score >= 620 & Credit Score < 680 = Medium RiskIf Credit Score >= 680 = Low Risk

Applicants

Usage Rules:E.g. how to price the 3 different classes of risk

Page 24: The First Step in Information Management - DAMA NY · Data Model Database Reality • A database is one level of abstraction removed from reality, and a data model is two levels removed

TaxonomiesAreActuallyMoreComplexModels

pg 2 © 2017 First San Francisco Partners www.firstsanfranciscopartners.com Proprietary and Confidential

Customer Table

Customer ID

Customer First Name

Customer Last Name

Customer Type

123456 ACME Inc CORPORATE

334455 John Smith INDIVIDUAL

765432 Amy Chen INDIVIDUAL

Social Security Number

111-22-3333

22-33-4444

Employer ID Number

11-5556666

Column only for

INDIVIDUAL

Column only for

CORPORATE

• Average Conversion Cost per Customer will likely be much higher for CORPORATE than INDIVIDUAL

• Marketing campaigns will likely be quite different for CORPRATE and INDIVIDUAL and there will be a need to distinguish them

•  Taxonomiesareneededtoclassifydata

•  Wearedealingwithdifferentsetsofrecords,sothesetaxonomiescannotbeinaDataDic=onary

•  Thatis,therela=onbetweenacolumnandtheType(inthiscase).

Page 25: The First Step in Information Management - DAMA NY · Data Model Database Reality • A database is one level of abstraction removed from reality, and a data model is two levels removed

www.firstsanfranciscopartners.com

ConceptualModels

Page 26: The First Step in Information Management - DAMA NY · Data Model Database Reality • A database is one level of abstraction removed from reality, and a data model is two levels removed

ExampleofConceptualModel

pg 2 © 2017 First San Francisco Partners www.firstsanfranciscopartners.com Proprietary and Confidential

PreferredTerm

BusinessTerm

Abbreviation

Acronym

FullTerm

BusinessConcept

CommonTerm

DefinitionShortDefinition

LongDefinition

Terminology

Semantics

Calculation

Examples

OfficialTerm

BusinessConcept

mustbeusedincertain

circumstances

Isusedmorecommonlythanotherterms

uses

BusinessUser

ishomonymof

signifies signifies

DescriptiveTerm

BusinessTermInternalOfficial

Term

ExternalOfficialTerm

usedinternally

iscomposedofwordsthatprovideanunderstanding

iscomposedofinitialletters

ExternalOrganization

requires

canbea

consistsof(inpart)

iscomposedofcompletewords

isnotcomposedofcompletewords

Definition

consistsof(inpart)

isdoublette of

consistsof(inpart)

usedexternally

SubstitutionaryPhrase

iseveryday

isbusinessspecific

Term

issynonymof

•  Asdetailedaspossible

•  BUT–enterprise-wide

•  Some=mesthoughtofas“views”

•  Nostandardizedwayofdoingthem

Page 27: The First Step in Information Management - DAMA NY · Data Model Database Reality • A database is one level of abstraction removed from reality, and a data model is two levels removed

ExampleofConceptualModel

pg 2 © 2017 First San Francisco Partners www.firstsanfranciscopartners.com Proprietary and Confidential

•  Thereisnostandardizedwayofdoingthesediagrams(intermsofdiagramnota=on)

•  Theremayneverbe

•  Sotrytomakethemasvisuallyunderstandableaspossible

SemanticIntegrity tprev

tcurrAccuracy

Timeliness

KnownUniverse

UnknownUniverse

Universe

Coverage

ClientID FirstName LastName DOB

Code

023 Smith 1965-02-01

NJ

Record

CompletenessofValues

RedundancyFirstName

12/08/1992

08/11/1987

Nationality

Completeness ofDataElements

InterestRate

9.4239573

Precision

Dateyyyy/mm/dd

12/31/2012

Conformity

State

NJ

Name

NewJersey

ReferentialIntegrity

Consistency(Record/Value

Level)

AddressLine3

Suite304

Fax:555-555-1212

Apt.713

Overloading

DataElement

Customer

DefinitionSomethingofinteresttous

WrittenPremium

DataDefinition(Missing/Unclear)

SemanticEquivalence

Existence

Page 28: The First Step in Information Management - DAMA NY · Data Model Database Reality • A database is one level of abstraction removed from reality, and a data model is two levels removed

Defini=ons

pg 2 © 2017 First San Francisco Partners www.firstsanfranciscopartners.com Proprietary and Confidential

•  Notjustadiagram

•  Alltermsmustbedefined

•  Allrela=onshipsmustbeexplained

•  Youcanaggregateintoadic=onaryformatacrossallterms,soyoucanuseatradi=onalmetadatarepository

•  Thisisnecessarytoconformtermsanddefini=onswheretheyoccurindifferentmodels,whichisaneedofthisapproach

•  And,againthiswillmakeitenterprisewide

Page 29: The First Step in Information Management - DAMA NY · Data Model Database Reality • A database is one level of abstraction removed from reality, and a data model is two levels removed

www.firstsanfranciscopartners.com

Conclusions

Page 30: The First Step in Information Management - DAMA NY · Data Model Database Reality • A database is one level of abstraction removed from reality, and a data model is two levels removed

Conclusions:1/2

pg 2 © 2017 First San Francisco Partners www.firstsanfranciscopartners.com Proprietary and Confidential

•  TheCorporateDataModel(CDM)wasadesignforallsystemsinanenterprise.Itfailedbecause:-  Packagedso_waremadeitirrelevant-  Itwasorientedtoopera=onalsystems-  Itcouldnotbecompletedasthebusinesschanged

•  But,today,intheGoldenAgeofData,thereisatremendousdemandtounderstanddataandenterprise“datamodels”cansa=sfythat-Butweneedtobeclearwhattheyaretryingtodo(hint:notbedatabasedesigns)

•  Notallneededseman=cscanbeaccommodatedintradi=onaldatamodels-Soweneednewer,beherapproaches

Page 31: The First Step in Information Management - DAMA NY · Data Model Database Reality • A database is one level of abstraction removed from reality, and a data model is two levels removed

Conclusions:2/2

pg 2 © 2017 First San Francisco Partners www.firstsanfranciscopartners.com Proprietary and Confidential

•  Atahighlevel,SubjectModelsareessen=al,andclearlyenterprise-wide-  Therecanbemanyperenterprise,whichisnotapopularidea-  Theyneedtobedecomposedtolowerlevelsofdetailthanisnormallyaccepted,butnotbelow

thelevelofmajoren=tytypes

•  Taxonomiesareessen=al,andthoughtheyareverylowlevel,theyshouldbeenterprise-wide-Buttheyareo_enthoughtofas“justreferencedata”withouttherulestheyneed

•  Complex,butdiscrete,conceptual(informa=on)modelsarehighlyusefulandcanbeenterprise-wide-  TheyareneededforBigData,DataLakes-  Theyareneededfortheabstrac=onlayerbetweendatastoresandbusinessinforma=on-  Buttheyarenewandmayneverhaveacommonnota=on-  Yettheyaretheexci=ngnewfron=erfordatamodeling

Page 32: The First Step in Information Management - DAMA NY · Data Model Database Reality • A database is one level of abstraction removed from reality, and a data model is two levels removed

Thankyou!MalcolmChisholmPh.D.

[email protected]