1 chuck kelley excellence in data, llc 1 room 1 kitchen garage room 2 room 3 the information...

55
1 Chuck Kelley Excellence In Data, LLC 1 room 1 kitchen garage room 2 room 3 The Information Blueprint Metadata

Upload: phoebe-nelson

Post on 30-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Chuck Kelley Excellence In Data, LLC 1 room 1 kitchen garage room 2 room 3 The Information Blueprint Metadata

1Chuck Kelley Excellence In Data, LLC

1

room 1 kitchen

garage

room 2room 3

The Information Blueprint

Metadata

Page 2: 1 Chuck Kelley Excellence In Data, LLC 1 room 1 kitchen garage room 2 room 3 The Information Blueprint Metadata

2Chuck Kelley Excellence In Data, LLC

Definition of MetadataMetadata is

Data about DataThe map of the Data WarehouseDefines the construction, health and

descriptive informationMetadata is not

The data itselfMaster dataExternal data (depending on the type of data!)

Page 3: 1 Chuck Kelley Excellence In Data, LLC 1 room 1 kitchen garage room 2 room 3 The Information Blueprint Metadata

3Chuck Kelley Excellence In Data, LLC

Hmmm, I wonder whatthis information reallymeans?

What information is available?What does it mean?How was it derived?What was its source?

How current is it?Who uses it?

How often is it used?

Metadata: the information “yellow pages”Metadata: the information “yellow pages”

Page 4: 1 Chuck Kelley Excellence In Data, LLC 1 room 1 kitchen garage room 2 room 3 The Information Blueprint Metadata

4Chuck Kelley Excellence In Data, LLC

Page 5: 1 Chuck Kelley Excellence In Data, LLC 1 room 1 kitchen garage room 2 room 3 The Information Blueprint Metadata

5Chuck Kelley Excellence In Data, LLC

Types of MetadataTechnicalBusinessContextual

Page 6: 1 Chuck Kelley Excellence In Data, LLC 1 room 1 kitchen garage room 2 room 3 The Information Blueprint Metadata

6Chuck Kelley Excellence In Data, LLC

Technical MetadataTechnical metadata is data about data needed by the

technology folks to do their work correctly. This includes the "good ole days" metadata, but adds much more. Technical metadata is used by the IT side to understand how the data warehouse/data mart was constructed. What is the system of record for a specific piece of

data, What transformations were performed on what source

data to produce data in the data warehouse/data mart, What are the columns in the data warehouse/data mart

and what do they mean, What is used to reconcile the data with the source

system, and When was the last date and time the data was loaded

into the data warehouse/data mart.

Page 7: 1 Chuck Kelley Excellence In Data, LLC 1 room 1 kitchen garage room 2 room 3 The Information Blueprint Metadata

7Chuck Kelley Excellence In Data, LLC

Business MetadataBusiness metadata is data about data needed by the

business community to do their work better. Business metadata is used by the business to understand what is available in the data warehouse/data mart and how, intheir terminology, is it built. Business metadata includeWho is the data steward,What is the confidence level of the data

and its quality, What algorithm is used to create the values,What is the definition of this data, andWhat reports are available.

Page 8: 1 Chuck Kelley Excellence In Data, LLC 1 room 1 kitchen garage room 2 room 3 The Information Blueprint Metadata

8Chuck Kelley Excellence In Data, LLC

Contextual Metadata Contextual metadata is data that sets "context" of your data. It really

isn't metadata in the typical sense of the word, but is classified asmetadata nonetheless. Examples of contextual metadata are

Weather reports, Headlines of the day, and Social, economic, and political issues.

Contextual metadata is the hardest to collect. Possible sources are newswire feeds (like AP, Wall Street Journal, Christian

ScienceMonitor), Internet sites (http://www.weather.com, http://www.wsj.com),or just plain manual input (which is probably the least desirable).

How does contextual metadata help? Let's say that your organization is the Department of Energy and you

noticed a major jump in spending on security during the late 1990s. Now, in 2009, the spending seems to be trending downward. How do you know why that might be happening? Duringthe late 1990s (see I don't remember the year or the name of the person already!), there was believed to be some breach of security and that classified data was being "stolen". If that information was captured, then when the trend is discovered, we could look at the context of what was happening in the late 1990s to see if it can help understand thetrend.

Page 9: 1 Chuck Kelley Excellence In Data, LLC 1 room 1 kitchen garage room 2 room 3 The Information Blueprint Metadata

9Chuck Kelley Excellence In Data, LLC

Page 10: 1 Chuck Kelley Excellence In Data, LLC 1 room 1 kitchen garage room 2 room 3 The Information Blueprint Metadata

10Chuck Kelley Excellence In Data, LLC

DataWarehouse

META DATA

room 1 kitchen

garage

room 2room 3

Metadata provides the blueprint of the Data Warehouse

Page 11: 1 Chuck Kelley Excellence In Data, LLC 1 room 1 kitchen garage room 2 room 3 The Information Blueprint Metadata

11Chuck Kelley Excellence In Data, LLC

Importance of Technical MetadataServes the IT community with

operational detail about information systems. HoweverMetadata is not the primary focus

of ITLooked upon as a documentation

exercise of minimal valueOften relegated to “nice to have”

status

Page 12: 1 Chuck Kelley Excellence In Data, LLC 1 room 1 kitchen garage room 2 room 3 The Information Blueprint Metadata

12Chuck Kelley Excellence In Data, LLC

Importance of Business MetadataServes the Business Community

as a source to discover what and where information existsBusiness meaning takes

precedence over technical detail (look for commonality)

Looked upon as a key source for knowledge on Operational processes

Page 13: 1 Chuck Kelley Excellence In Data, LLC 1 room 1 kitchen garage room 2 room 3 The Information Blueprint Metadata

13Chuck Kelley Excellence In Data, LLC

Importance of MetadataServes the Data Warehouse as a key

enablerOf primary importance to DSS AnalystsMetadata is critical to tracking the

content and validity of data in the Warehouse

Provides context to the dataIssue: “Knowledge Gap” between OLTP and DW can affect the success of Data Warehousing Implementations

Page 14: 1 Chuck Kelley Excellence In Data, LLC 1 room 1 kitchen garage room 2 room 3 The Information Blueprint Metadata

14Chuck Kelley Excellence In Data, LLC

Importance of MetadataWhat it does

Describes data in operational systems which facilitates mapping data elements to the DW data conversion aggregation & summarization logic coordinating naming conventions managing anomalies between

physical characteristics of common data across information systems

Page 15: 1 Chuck Kelley Excellence In Data, LLC 1 room 1 kitchen garage room 2 room 3 The Information Blueprint Metadata

15Chuck Kelley Excellence In Data, LLC

Importance of MetadataMetadata provides a new

dimensionIt allows

Data to be managed over time ( 5 - 10 years) Data to be managed by context (business

meaning and business value will change over time)

Manages structural changes to the DW database (versioning of metadata)

Allows Operational Systems to reinvent themselves by discovering corporate data which exists across systems

Page 16: 1 Chuck Kelley Excellence In Data, LLC 1 room 1 kitchen garage room 2 room 3 The Information Blueprint Metadata

16Chuck Kelley Excellence In Data, LLC

Page 17: 1 Chuck Kelley Excellence In Data, LLC 1 room 1 kitchen garage room 2 room 3 The Information Blueprint Metadata

17Chuck Kelley Excellence In Data, LLC

Metadata Exists Everywhere!

Metadata

Manager

Metadata

Manager

Physical

Database

Definition

External

Sources

Operational

Data

Sources

Summarization

Data Warehouse

Data Model

Internal

Non-Operational

Data Sources

Database Definitions

File Definitions

COBOL Copybooks

Data Extraction Tool

Data Dictionary

Database Definitions

Data Modeling Tool

Data Dictionary

DSS Tool

Data Definitions

DSS Tool

Business Catalog

Page 18: 1 Chuck Kelley Excellence In Data, LLC 1 room 1 kitchen garage room 2 room 3 The Information Blueprint Metadata

18Chuck Kelley Excellence In Data, LLC

Metadata and the Data WarehouseIn

terf

ace

, T

ran

sform

ati

on

,an

d L

oad

Data

Ag

gre

gati

on

Data

Acc

ess

Data

Ware

hou

se

Data

Mod

el

Data Quality

Data Warehouse OperationsB

usi

ness

Ru

les

/D

eri

ved

Measu

res

Page 19: 1 Chuck Kelley Excellence In Data, LLC 1 room 1 kitchen garage room 2 room 3 The Information Blueprint Metadata

19Chuck Kelley Excellence In Data, LLC

Interface, Transform, and LoadDescribes the interface location and

content.Describes information about the

transformation from source system codes to reference data codes.

Describes information about custom transformation, such as, using subsets of the data.

Describes information about the Data Warehouse destination

Page 20: 1 Chuck Kelley Excellence In Data, LLC 1 room 1 kitchen garage room 2 room 3 The Information Blueprint Metadata

20Chuck Kelley Excellence In Data, LLC

ETL Metadata Points

FilterFilter CleanseCleanse

ExtractExtract

TransformTransform Log/QALog/QA

DataWarehouse

DataSources

Page 21: 1 Chuck Kelley Excellence In Data, LLC 1 room 1 kitchen garage room 2 room 3 The Information Blueprint Metadata

21Chuck Kelley Excellence In Data, LLC

Information Sourcing Activities

Extract Pull Data from Operational Systems

Raw Data

Activity Description Outcomes

Filter Discard “Noise” data from data set

Dirty Data

Cleanse Analyze data qualityand make corrections

Clean Data

Transform Rearrange and SummarizeData

Useful Data

Log/QA Perform Final Checkand Build “Yellow Pages”

Verified DataMetadata

Page 22: 1 Chuck Kelley Excellence In Data, LLC 1 room 1 kitchen garage room 2 room 3 The Information Blueprint Metadata

22Chuck Kelley Excellence In Data, LLC

DW Data ModelA description of each attribute and entity of

the data model.This is an extract from the CASE tool that

manages the data model or has been the output of a data dictionary.

Page 23: 1 Chuck Kelley Excellence In Data, LLC 1 room 1 kitchen garage room 2 room 3 The Information Blueprint Metadata

23Chuck Kelley Excellence In Data, LLC

AggregationRules based engine for Aggregation.States which fields from which DW tables are

combined and the algorithm that aggregates the data.

Used to create code or to suggest stored procedures for aggregation.

Page 24: 1 Chuck Kelley Excellence In Data, LLC 1 room 1 kitchen garage room 2 room 3 The Information Blueprint Metadata

24Chuck Kelley Excellence In Data, LLC

Data Access - ReportingReport Generation Metadata

A rules based reporting tool that describes a report format from the header to column and rows.

Report Menu MetadataDescribes the reports that are available.Describes the Menu that the user is shown for

accessing the available reports.

Page 25: 1 Chuck Kelley Excellence In Data, LLC 1 room 1 kitchen garage room 2 room 3 The Information Blueprint Metadata

25Chuck Kelley Excellence In Data, LLC

Data Access - QueryDefines canned queries available.Defines public and private queries.Allows queries to be combined.

Page 26: 1 Chuck Kelley Excellence In Data, LLC 1 room 1 kitchen garage room 2 room 3 The Information Blueprint Metadata

26Chuck Kelley Excellence In Data, LLC

Data Access - End User

Data Model

Interface, Transformation,

and Load

Application Help files

End User Application

Data Warehouse

Derived Business Measures

Page 27: 1 Chuck Kelley Excellence In Data, LLC 1 room 1 kitchen garage room 2 room 3 The Information Blueprint Metadata

27Chuck Kelley Excellence In Data, LLC

Data QualityDW Load Statistics

The use of control numbers from the source system, compared to the load data.

DW Quality RulesRules that tracked known data trends for

report checking

Page 28: 1 Chuck Kelley Excellence In Data, LLC 1 room 1 kitchen garage room 2 room 3 The Information Blueprint Metadata

28Chuck Kelley Excellence In Data, LLC

Page 29: 1 Chuck Kelley Excellence In Data, LLC 1 room 1 kitchen garage room 2 room 3 The Information Blueprint Metadata

29Chuck Kelley Excellence In Data, LLC

Metadata ComponentsStorage in the WarehouseOperational MappingExtract HistoryVolumetricsAlgorithmsRelationship HistoryOwnership/StewardshipExternal/Reference DataBusiness Meaning (Data Models)

Storage Mapping History

Volumetrics Algorithms Relationships

Ownership Reference Data Models

Page 30: 1 Chuck Kelley Excellence In Data, LLC 1 room 1 kitchen garage room 2 room 3 The Information Blueprint Metadata

30Chuck Kelley Excellence In Data, LLC

Metadata ComponentsStorage Requirements

Database SchemaTable SpacesDatabase Tables(Dimensions, Facts)Keys and IndexesFacts (Attributes)Information Access (Data Topology)

PC’s EIS DSS Operational

Ownership Reference Data Models

Volumetrics Algorithms Relationships

Mappin History

Storage

Page 31: 1 Chuck Kelley Excellence In Data, LLC 1 room 1 kitchen garage room 2 room 3 The Information Blueprint Metadata

31Chuck Kelley Excellence In Data, LLC

Metadata ComponentsOperational Mapping

Location of data sourcesData Element conversion

Physical characteristic conversions naming changes default values encoding

Data Key changesLogic & Algorithms

Ownership Reference Data Models

Volumetrics Algorithm Relationships

History Storage

Mapping

Page 32: 1 Chuck Kelley Excellence In Data, LLC 1 room 1 kitchen garage room 2 room 3 The Information Blueprint Metadata

32Chuck Kelley Excellence In Data, LLC

Metadata ComponentsExtract History

Logged history of data extracts and transformations

Audit logsJob Scheduling (Batch, On-line)

Ownership Reference Data Models

Volumetrics Algorithms Relationships

Storage Mapping

History

Page 33: 1 Chuck Kelley Excellence In Data, LLC 1 room 1 kitchen garage room 2 room 3 The Information Blueprint Metadata

33Chuck Kelley Excellence In Data, LLC

Metadata ComponentsVolumetrics

Number of TablesNumber of RowsUsage CharacteristicsTable IndexingAging Criteria

Ownership Reference Data Models

Algorithms Relationships

Storage Mapping History

Volumetrics

Page 34: 1 Chuck Kelley Excellence In Data, LLC 1 room 1 kitchen garage room 2 room 3 The Information Blueprint Metadata

34Chuck Kelley Excellence In Data, LLC

Metadata ComponentsAlgorithms

Levels of SummarizationCriteria applied to Data

AggregationData Derivation

Ownership Reference Data Models

Relationships

Storage Mapping History

Volumetrics

Algorithms

Page 35: 1 Chuck Kelley Excellence In Data, LLC 1 room 1 kitchen garage room 2 room 3 The Information Blueprint Metadata

35Chuck Kelley Excellence In Data, LLC

Metadata ComponentsRelationship History

Relationship ArtifactsRelationship History

Tables included Effective Dates Constraints in Effect Cardinality in Effect Description Ownership Reference Data Models

Storage Mapping History

Volumetrics Algorithms

Relationships

Page 36: 1 Chuck Kelley Excellence In Data, LLC 1 room 1 kitchen garage room 2 room 3 The Information Blueprint Metadata

36Chuck Kelley Excellence In Data, LLC

Metadata ComponentsOwnership/Stewardship

Operational Ownership Updates Recovery Accuracy

Data Warehouse Stewardship Data consistency Loading Access

Reference Data Models

Storage Mapping History

Volumetrics Algorithms Relationships

Ownership

Page 37: 1 Chuck Kelley Excellence In Data, LLC 1 room 1 kitchen garage room 2 room 3 The Information Blueprint Metadata

37Chuck Kelley Excellence In Data, LLC

Metadata ComponentsExternal/Reference Data

Location, type and content of external dataEncoded values and changesAudit log of changesDate/Time stamps

Data Models

Storage Mapping History

Volumetrics Algorithms Relationships

Ownership

Reference

Page 38: 1 Chuck Kelley Excellence In Data, LLC 1 room 1 kitchen garage room 2 room 3 The Information Blueprint Metadata

38Chuck Kelley Excellence In Data, LLC

Metadata ComponentsBusiness Meaning (Data Models)

Data Warehouse Data Model (Logical)

Mapping to Data Warehouse Database Design (Physical)

Mapping to Operational Systems Data Models (Corporate & Business Area)

Mapping to other DW architecture Metadata EIS/DSS Data Mining/Data Journalism

StorageMapping History

Volumetrics Algorithms Relationships

Ownership Reference

Data Models

Page 39: 1 Chuck Kelley Excellence In Data, LLC 1 room 1 kitchen garage room 2 room 3 The Information Blueprint Metadata

39Chuck Kelley Excellence In Data, LLC

Page 40: 1 Chuck Kelley Excellence In Data, LLC 1 room 1 kitchen garage room 2 room 3 The Information Blueprint Metadata

40Chuck Kelley Excellence In Data, LLC

How to Use MetadataMetadata Manager Requirements

Required features include: GUI Data Model Management Model/Data Versioning Data Access & Security Integration with the DW DBMS Integration with DW Architecture Unstructured Reference Data

Management (futures)

Storage Mapping History

Volumetrics Algorithms Relationships

Ownership Reference Data Models

Page 41: 1 Chuck Kelley Excellence In Data, LLC 1 room 1 kitchen garage room 2 room 3 The Information Blueprint Metadata

41Chuck Kelley Excellence In Data, LLC

How to Use MetadataMost current Repositories are not

extensibleFew specialized tools are available for

Metadata Mining (Data Re-engineering)There is no standard way to exchange

metadatabetween various Meta Manager toolsbetween EIS/DSS tool setsbetween OLTP DBMS and DW DBMS

OLTP CASE repositories which manage business models are not geared for Data Warehousing

However...

Page 42: 1 Chuck Kelley Excellence In Data, LLC 1 room 1 kitchen garage room 2 room 3 The Information Blueprint Metadata

42Chuck Kelley Excellence In Data, LLC

How to use MetadataHow to support it (Metadata

Maintenance)Care and feeding of Metadata is

just as important as the data itselfOther Considerations.

How to get IT and the Business Client to use metadata Have a single point of contact Always do it at their terminal (DW or

Client) Always let them do it with your help

Page 43: 1 Chuck Kelley Excellence In Data, LLC 1 room 1 kitchen garage room 2 room 3 The Information Blueprint Metadata

43Chuck Kelley Excellence In Data, LLC43

Page 44: 1 Chuck Kelley Excellence In Data, LLC 1 room 1 kitchen garage room 2 room 3 The Information Blueprint Metadata

44Chuck Kelley Excellence In Data, LLC

What to Look for in Products

44

From David Marco’s book

Building and Managing the Meta Data Repository: A Full Lifecycle Guide   

Page 45: 1 Chuck Kelley Excellence In Data, LLC 1 room 1 kitchen garage room 2 room 3 The Information Blueprint Metadata

45Chuck Kelley Excellence In Data, LLC

Vendor BackgroundFull name and business address of vendor. Parent Company. Number of years company has been in business. Company structure. Is it a corporation, partnership, or

privately held? List names associated with structure if different from question # 1.

Public or privately held company. If public, which exchange is company traded on, and what is the company's market symbol?

When did the company go public, or when is it expected to go public?

Total number of employees worldwide?Total number of U.S. employees?Web site URLNumber of developers supporting proposed product solution?Company profit/loss for the last three years (if available).        

45

Page 46: 1 Chuck Kelley Excellence In Data, LLC 1 room 1 kitchen garage room 2 room 3 The Information Blueprint Metadata

46Chuck Kelley Excellence In Data, LLC

Proposed Solution Overview Summary of the vendor's proposed solution and explain how it

meets the needs What are the names and versions of the product(s) component(s) comprising

the vendor's proposed solution? The repository architect and infrastructure architect need to carefully review all the components in the proposed

solution and compare them with the target technical environment and support structure. How do the components communicate? What hardware platforms, DBMS's, Web servers and communications protocols do the components require? How is security and migration handled among the various components?

Number of worldwide production installations using precisely this proposed solution configuration.

Be sure to consider the hardware, DBMS, Web server, etc. How many other companies are using the same confiruration? Is your company going to be the first?

What hardware, operating system, DBMS and web browser limitations do each of the product(s) component(s) have in the proposed solution on client and server platforms?

Be mindful of any requirements to download. Java applets and/or ActiveX controls to the client. This might be in conflict with your company's web policy or if deployed externally your clients.

What is the release date and version number history of each of the product(s) component(s) for the past 24 months

What was the anticipated release date and new feature list for each of the product(s) features and component(s) over the next 12 months?

46

Page 47: 1 Chuck Kelley Excellence In Data, LLC 1 room 1 kitchen garage room 2 room 3 The Information Blueprint Metadata

47Chuck Kelley Excellence In Data, LLC

Cost of SolutionTotal cost of proposed solution. Cost of consulting services required for installation.

Negotiate consulting time up front to complete staff training and get the repository up and running as quickly as possible.

Cost of consulting services for initial project setupWhat is the vendor's daily rate for consulting

services without expenses? Annual maintenance cost/fee

This should range any where from 14 percent to 18 percent of solution price?

Are all new product component releases/upgrades provided while under an annual maintenance agreement? If not, please explain in detail.

47

Page 48: 1 Chuck Kelley Excellence In Data, LLC 1 room 1 kitchen garage room 2 room 3 The Information Blueprint Metadata

48Chuck Kelley Excellence In Data, LLC

Technical Requirements Are there any database schema design requirements for the DSS data

model in order to function with the repository product? Does the proposed solution require a change in the existing DSS schema

design in order to function? How does the tool control the various versions of the meta data

(development, quality assurance and production) stored in the repository? How is meta data from multiple DSS projects controlled and separated?

How can the various projects share meta data? The answer to this question will determine how you administer the product

and provide security. Describe how meta data repository contents are migrated from one system

engineering phase to the next (development, quality assurance and production)? How does this processing sequence differ when dealing with multiple projects on various time lines?

In particular how is meta data migrated through the various design phases? Can a single project or portion of a project be migrated forward? How?

What DBMS privileges does the product support (e.g., roles, accounts, and views)?

Can DBMS-specific SQL statements be incorporated into queries?

48

Page 49: 1 Chuck Kelley Excellence In Data, LLC 1 room 1 kitchen garage room 2 room 3 The Information Blueprint Metadata

49Chuck Kelley Excellence In Data, LLC

Technical RequirementsDescribe the security model used with the product? How does the product use existing infrastructure

security systems? Does the product use any type of single sign-on

authentication (e.g., LDAP)? Where are user security constraints for the product

stored? Can a user have access to the repository tool for

one project but no access for another project? Can a user view the SQL generated by the product? Is the product Web-enabled? Describe.

49

Page 50: 1 Chuck Kelley Excellence In Data, LLC 1 room 1 kitchen garage room 2 room 3 The Information Blueprint Metadata

50Chuck Kelley Excellence In Data, LLC

ImplementationDescribe sequence of events and level of effort

recommended for clients to consider in planning their implementation strategy.

What is the typical duration of the implementation cycle?

How many DSS database schema dimensions and facts can the proposed product solution handle?

Provide a sample project plan for implementation of your proposed solution for a single DSS project.

What client resource skill sets need to be in place for installation and implementation?

50

Page 51: 1 Chuck Kelley Excellence In Data, LLC 1 room 1 kitchen garage room 2 room 3 The Information Blueprint Metadata

51Chuck Kelley Excellence In Data, LLC

And Lastly, but very importantObtain from vendor at least three customers

references

51

Page 52: 1 Chuck Kelley Excellence In Data, LLC 1 room 1 kitchen garage room 2 room 3 The Information Blueprint Metadata

52Chuck Kelley Excellence In Data, LLC52

Page 53: 1 Chuck Kelley Excellence In Data, LLC 1 room 1 kitchen garage room 2 room 3 The Information Blueprint Metadata

53Chuck Kelley Excellence In Data, LLC

Top Five Mistakes of Metadata1. Not defining the Objectives of the Metadata2. Purchasing the tool before the requirements3. Choosing the tool before an evaluation4. Making Metadata to hard to utilize5. Not understanding the effort of Metadata

53

Page 54: 1 Chuck Kelley Excellence In Data, LLC 1 room 1 kitchen garage room 2 room 3 The Information Blueprint Metadata

54Chuck Kelley Excellence In Data, LLC

ConclusionsMetadata is a critical component of any data

warehouseInformation users must learn how to use it.Learn from others mistakes

Page 55: 1 Chuck Kelley Excellence In Data, LLC 1 room 1 kitchen garage room 2 room 3 The Information Blueprint Metadata

55Chuck Kelley Excellence In Data, LLC

Chuck Kelley30+ year professional in dealing with [email protected]

“I never metadata I didn’t like”