navigating the bi stack _

42
Copyright 2014 Proprietary and Confidential NAVIGATING THE BI STACK DEVELOPING OPERATIONAL AND TRANSFORMATIONAL INSIGHT FROM THE WISDOM OF THE CROWD

Upload: michael-phipps

Post on 22-Jan-2017

129 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Navigating the BI Stack _

Copyright 2014 Proprietary and Confidential

NAVIGATING THE BI STACK

DEVELOPING OPERATIONAL AND TRANSFORMATIONAL INSIGHT FROM THE WISDOM OF THE CROWD

Page 2: Navigating the BI Stack _

CROWD WISDOM

2

• The “Wisdom of the Crowd” is the collective opinion of a group of individuals rather than that of a single expert. This collective wisdom results in more accurate information and better decisions than an individuals due to bias and idiosyncratic noise.

• When applied to an analytics context, we use each original data source as one or more “opinions” about an aspect of the business in addition to the stakeholders.

• Multiple datasets, brought together in one location with the express purpose of “harmonizing data”, adds to the collective wisdom of our crowd.

• The expectation is that while not all individuals will always agree, a ‘majority rule’ approach can be effective in decision making.

• Dissenting “opinions” have an important role to play as well, establishing checks and balances against propagated errors and organizational “groupthink”.

• Some examples of crowd wisdom are criminal juries and Google’s search engine.

This article uses material from the Wikipedia article Wisdom of the Crowd, which is released under the Commons Attribution-Share-Alike License 3.0

Page 3: Navigating the BI Stack _

Business Intelligence is set of tools used to implement an approach to understanding one’s business, industry, and the best actions to take to maximize

value organizationally.

WHAT IS BI?

3

AN APPROACH

A method of analysis that compounds very large, complex datasets into relationship driven metrics that allow an organization to consume and take action across multiple systems.

Examples include data warehousing, semantic / statistical / predictive modeling, OLAP cubes, and big data universes.

A TOOLSET

A collection of software driven tools used to acquire, interpret, and communicate key business data

Examples include SQL, Business Objects, MicroStrategy, Crystal Reports, SSIS, SAS, Microsoft Excel, and many others.

Page 4: Navigating the BI Stack _

WHAT IS A BI STACK THEN?

4

• The BI Stack is the logical process of using the tools and methods of BI to process data.

• When data moved through the BI Stack, it’s called “migrating” or “promoting”, used in the sense that the data has passed through the tools successfully and without errors tested in the previous layers.

• There are many routes through the BI stack, but in general data only moves in one direction. Think of it like a big filter organizing data as it moves through the stack.

• The stack has many disciplines and players involved, and each part requires a specific skillset.

• The end result of the process is simple:actionable data that provides greater insight to an organization.

Page 5: Navigating the BI Stack _

WHAT IS AN EDW?

5

• An Enterprise Data Warehouse (EDW) is one part of the BI Stack. While it is a large part of the BI Stack (both physically and logically) it isn’t the end point for data.

• The EDW is very similar to a warehouse in the sense that it is a huge space to put objects for later use.

• Specifically it is the end point for the data in terms of data transformations. When data reaches the EDW for all intents and purposes it is as “clean” as it can be but may NOT be totally aligned with all business rules, standards, etc.

• A great deal of work still happens after the data warehouse. From a logical standpoint, the EDW occupies a place about 2/3 of the way through the BI stack.

Page 6: Navigating the BI Stack _

WHY IS AN EDW INSUFFICIENT FOR ACTIONABLE ANALYSIS?

6

• An EDW is often perceived as the end point of a BI method, but it’s actually much closer to the beginning from an analytic standpoint.

• Generally once data arrives at the EDW, this is when analysts start to develop their work and interpretations. Most operational reporting will live at this level, but analysis continues on. This is what’s called Operational Insight, and while valuable really focuses on the questions “What am I doing, and what have I done?”

• Most aggregate metrics, analytic models, and semantic relationships cannot occur at this level without significant additional challenges.

Page 7: Navigating the BI Stack _

IF NOT THE EDW, THEN WHAT?

7

• An EDW is the foundation of a highly performing analytic toolset. We must not think of data as something to “warehouse” and more of something to keep moving, or “pumping”

• This is the difference between a static BI Stack and an active BI Stack. One sees EDW as an endpoint, whereas the other sees the EDW as a data pump, constantly refining data to get greater value from it.

• All relational analytic tools are based, in part, on a robust EDW. Most often, they will have other aspects built above it though. These include semantic layers, OLAP cubes, data universes/marts/stores, sandboxes, and other types of tools.

• These tools are often what result in Transformational Insight, or answers to questions such as “What should I do next” or “What will my clients/suppliers/industry do in the future”. Other types of transformational questions are “What should I have done” and “Was my prediction accurate?”

Page 8: Navigating the BI Stack _

DEFINING BI SUCCESS - SYSTEM

8

• Accessible• The EDW should be accessible by all team members that interact with data in a way that is consumable,

understandable, and simple to obtain.

• Adaptable• As the business evolves, the way we store, perceive, and consume data must evolve as well. Building

flexibility into the DNA of the data warehouse is fundamental to the overall success of the product.

• Quality• No dataset is perfect, and it is always necessary to validate, scrub, and qualify datasets regardless of

pedigree. A careful balance must be struck to eliminate as many flaws as reasonably possible while maintaining the integrity and culture of the original data. This includes standardization as well as alignment where appropriate.

• Secure• All care should be taken to protect our data as carefully as we would the patients we serve. Data should

only be released to those allowed to see it, and only so much data as is necessary to complete the task at hand. All data generated should have a designated life span and disposal process.

Page 9: Navigating the BI Stack _

DEFINING BI SUCCESS - APPLICATION

9

• Self Service Enablement• Data is neither the exclusive domain of IT nor analysts. Data is necessary to support all aspects of the company and therefore all

team members should be able to access data necessary to their daily work independently.

• Platform Agnostic• The purpose of the data warehouse structure should be to align the data we produce and consume with the business needs rather

than the originating data source. Thus, the data should be divorced from the paradigm of the originating application insofar as is necessary to align with the organizational structure (and therefore business needs) of the company.

• Fully Integrated• Regardless of data source or purpose, our business revolves around four core datasets: patient, provider, payor, and

benchmarks. Each of these data are inextricably linked and therefore so should the data in the data warehouse. Data should be related in an efficient and accurate way that also allows for unique analytic approaches as necessary

• Advanced Analytic support• The end goal of a data warehouse is not to recreate original data sources or to automate simple tasks. It is to enable the

advanced toolsets that are only possible with a much larger perspective of our business than any single operational tool can provide. Enabling sandboxing, modeling, prediction, and other advanced capabilities that can then be integrated into all aspects of decision-making is the hallmark of an evolved data warehouse.

Page 10: Navigating the BI Stack _

BUSINESS INTELLIGENCE IS A CYCLE, NOT A GOAL!

10

• The BI environment is best viewed as an organic cycle in which capabilities grow and evolve with the business

• The tools must adapt to the business rather than vice-versa

Interview

Plan

Prototype

Promote

Utilize

Monitor

Page 11: Navigating the BI Stack _

LET’S BUILD A BI STACK! SOURCE DATA

11

Claims External Auths Fin / GL Benchmark Others

• At the bottom of the stack is the SOURCE DATA. This is also called the Original Data Source (External) or sometimes just ODS.

• This ODS is generally NOT used for reporting or analysis except when done from within the host application.

• This also becomes the validation “source of truth” until final approval is given later to move into a production space.

• There is one box for every dataset sent into the EDW here. So every application, external data source, etc.

• One important point to understand here is that every time data “moves”, we use a program called an “ETL Process”. ETL means Extract, Transform, and Load.

Page 12: Navigating the BI Stack _

LET’S BUILD A BI STACK! DATA STAGING

12

Claims External Auths Fin / GL Benchmark Others

Stage

• In the STAGE layer, all of the data is unified into a single DBMS platform. A DBMS platform is essentially a big database application like Microsoft SQL, Oracle, Teradata, and many others.

• During this step, many different systems feed into the database. Everything from a simple text file to major mainframe systems are organized into a single common data format.

• No data correction takes place at this point. This data should be essentially a “mirror image” of the data in its original source.

• While ETL is happening here, we’re much more focused on the Extract and Load aspects at this point.

Page 13: Navigating the BI Stack _

LET’S BUILD A BI STACK! LOAD LAYER

13

Claims External Auths Fin / GL Benchmark Others

Stage

Load DEV Load TEST / QA Load PROD

• In the LOAD LAYER, data begins it’s first set of transformations.

• Data begins to go through qualitative review, and the ETL process looks for data failures, changes, and other such issues.

• There are 3 data “environments” here. • DEV (Development)- a place to invent new ways to work with data• TEST / QA- a place to validate the proper function of DEV data• PROD (Production)- where the live data is loaded for promotion into

EDW.

Page 14: Navigating the BI Stack _

LET’S BUILD A BI STACK! DATA / ODS LAYER

14

Claims External Auths Fin / GL Benchmark Others

Stage

Load DEV

Data / ODS DEV

Load TEST / QA

Data / ODS TEST / QA

Load PROD

Data / ODS PROD

• In the DATA or ODS layer, we see the results of the initial transformations to the data from LOAD.

• ODS in this case means Operational Data Store. This is the data that looks almost exactly the same as the original data, but has passed quality tests and is now stored consistent with most core data rules.

• While this is technically the beginning of the analytic area, data here is not yet related to other data. So, you won’t find an easy way to align a claim with an authorization for example.

• At this layer and lower, IT takes an ownership role while Analytics is responsible for clarifying business needs.

Page 15: Navigating the BI Stack _

LET’S BUILD A BI STACK! EDW LAYER

15

Claims External Auths Fin / GL Benchmark Others

Stage

Load DEV

Data / ODS DEV

EDW DEV

Load TEST / QA

Data / ODS TEST / QA

EDW TEST / QA

Load PROD

Data / ODS PROD

EDW PROD

• The EDW layer is where the final core data transformations are made, as well as all “standardized” relationships between datasets.

• At this point, the data is scrubbed for accuracy, completeness, and alignment with organizational standards but it still may be very “raw”.

• Few metrics or aggregations, if any, will appear here. This is still a purely data-driven dataset.

• Data here should be generally easy to validate against the ODS data.

Page 16: Navigating the BI Stack _

LET’S BUILD A BI STACK! SEMANTIC LAYER

16

Claims External Auths Fin / GL Benchmark Others

Stage

Load DEV

Data / ODS DEV

EDW DEV

Load TEST / QA

Data / ODS TEST / QA

EDW TEST / QA

Load PROD

Data / ODS PROD

EDW PROD

Semantic OLAP Sandbox Prod Models BI Universe

• This is where the analytic teams generally live. Most metrics, aggregations, and analytic tools will use this layer for day to day work.

• As necessary, analysts may look deeper into the stack to obtain data not available at this level.

• At this point IT tends to act as steward, while Analytics takes an ownership role.

Page 17: Navigating the BI Stack _

LET’S BUILD A BI STACK! MODEL LAYER

17

Claims External Auths Fin / GL Benchmark Others

Stage

Load DEV

Data / ODS DEV

EDW DEV

Load TEST / QA

Data / ODS TEST / QA

EDW TEST / QA

Load PROD

Data / ODS PROD

EDW PROD

Semantic OLAP Sandbox Prod Models BI Universe

Analytic Modeling

• In this modeling layer, the Advanced Analytics teams will build various data models including predictive, statistic, behavioral, market, etc.

• The outcomes of these models can be used as not only an end-point analysis, but can also continue to feed out the top of the BI Stack and influence the data in the LOAD layer as well!

• These tools are often developed using highly cleansed datasets and tend to have a much narrower analytic focus, requiring careful interpretation.

Page 18: Navigating the BI Stack _

LET’S BUILD A BI STACK!

18

Claims External Auths Fin / GL Benchmark Others

Stage

Load DEV

Data / ODS DEV

EDW DEV

Load TEST / QA

Data / ODS TEST / QA

EDW TEST / QA

Load PROD

Data / ODS PROD

EDW PROD

Semantic OLAP Sandbox Prod Models BI Universe

Analytic Modeling

Outbound Data Pumps / ETL

• In this layer, data is pushed out of the stack. This data generally includes• Detailed analytic results routed to other applications internal to the

organization• Data destined to be transformed or to transform data as it migrates

into the BI Stack STAGE layer• Externally focused datasets that have been validated for automated

release

• Most often this is an automated solution

• These outputs are generally NOT considered analysis datasets, although often are used as such

Page 19: Navigating the BI Stack _

LET’S BUILD A BI STACK! PRESENTATION LAYER

19

Claims External Auths Fin / GL Benchmark Others

Stage

Load DEV

Data / ODS DEV

EDW DEV

Load TEST / QA

Data / ODS TEST / QA

EDW TEST / QA

Load PROD

Data / ODS PROD

EDW PROD

Semantic OLAP Sandbox Prod Models BI Universe

Presentation

Analytic Modeling

Outbound Data Pumps / ETL• This is the final layer of the BI Stack, and it consists primarily of

aggregations, metrics, and visualizations.

• Report developers and operators tend to work mostly in this layer

• This layer will also contain unattended analyses such as automated, subscribed reports, dashboards, and analytics-on-rails datasets

• Most leadership should be comfortable working in this layer. It often has drag and drop interfaces, highly cleansed data, and well documented standards

• It is uncommon for detail level data to be accessible in this layer without special permissions.

Page 20: Navigating the BI Stack _

LET’S BUILD A BI STACK! THE FINAL MODEL

20

Claims External Auths Fin / GL Benchmark Others

Stage

Load DEV

Data / ODS DEV

EDW DEV

Load TEST / QA

Data / ODS TEST / QA

EDW TEST / QA

Load PROD

Data / ODS PROD

EDW PROD

Semantic OLAP Sandbox Prod Models BI Universe

Presentation

Analytic Modeling

Outbound Data Pumps / ETL

Page 21: Navigating the BI Stack _

LET’S BUILD A BI STACK! MOVEMENT

21

Claims External Auths Fin / GL Benchmark Others

Stage

Load DEV

Data / ODS DEV

EDW DEV

Load TEST / QA

Data / ODS TEST / QA

EDW TEST / QA

Load PROD

Data / ODS PROD

EDW PROD

Semantic OLAP Sandbox Prod Models BI Universe

Presentation

Analytic Modeling

Outbound Data Pumps / ETL

Movement

Generally data only moves vertically through the BI Stack.

However, in more advanced implementations the results of analysesand models are routed back into the STAGE layer to influence the data as it promotes upward.

This is another reason an effective BI Stack is called a “Data Pump”.

Each time data “moves” through the BI stack, a program must be written to make that happen. These

programs are called ETL processes.

Page 22: Navigating the BI Stack _

REMOVING THE DEVIL FROM THE DETAILS

22

• Typically, the higher (vertically) you move in the stack, the cleaner the data becomes.

• In addition, data becomes simpler to use because more business rules are applied the higher you go.

• The tradeoff for the greater simplicity and cleaner data is at the sacrifice of ODS alignment and detail availability.

• The data is still available through an effective “Data Lineage” tools available to analysts that help explain how the rules influenced the data.

Simple Data

Complex Data

Clean values

ODS Flaws

Page 23: Navigating the BI Stack _

Business Users

Adv. Analytics TeamIT Oversight

IT EDW TeamBusiness Analytics

IT Application Team

MAJOR DOMAINS

23

Claims External Auths Fin / GL Benchmark Others

Stage

Load DEV

Data / ODS DEV

EDW DEV

Load TEST / QA

Data / ODS TEST / QA

EDW TEST / QA

Load PROD

Data / ODS PROD

EDW PROD

Semantic OLAP Sandbox Prod Models BI Universe

Presentation

Analytic Modeling

Outbound Data Pumps / ETL

Situationally these roles may change, but in general the areas of concern are fairly well aligned with the level within the BI Stack.

Page 24: Navigating the BI Stack _

SO IT’S JUST A BUNCH OF DATABASES?

24

• The magic of the BI Stack isn’t in the value of the data held therein, but in the relationships between differing datasets.

• Ideally, because these datasets all essentially describe the same thing – our business, patients, and industry – they should all be able to weave into each other.

• The wisdom of crowds is revealed in applying business questions to the many layers of related data. Additionally, the dissenting opinions within the data also provide insight into flaws, misunderstandings, and opportunities.

• It’s in the relationships between data where the true art of analysis becomes visible, and this has a value that far exceeds the intrinsic measure.

• A skilled EDW development team can make a series of databases drive transformational change not otherwise possible.

Page 25: Navigating the BI Stack _

BI MATURITY – ROI PERSPECTIVE

25

http://www.eurim.org.uk/activities/ig/voi/03-01-06_Executive_Series_Assessing_Your_BI_Maturity.pdf

Page 26: Navigating the BI Stack _

BI MATURITY – UTILIZATION PERSPECTIVE

26

http://www.eurim.org.uk/activities/ig/voi/03-01-06_Executive_Series_Assessing_Your_BI_Maturity.pdf

Page 27: Navigating the BI Stack _

REPORTING VALUE MATRIX

27

INSIGHT

ACTION

STRATEGYUNDERSTANDING

To determine the VALUE of an analysis, we must establish the

insight it provides. This is measured using 3 metrics driven by simple

questions:

ACTIONWhat change will you make using this

information?

STRATEGYHow will you change your approach to the

business with this report?

UNDERSTANDINGWhat deeper knowledge will you gain from

this data?

Page 28: Navigating the BI Stack _

PRIORITIZATION HIERARCHY

28

Prioritization generally falls along two scales :

INTENT AND MATERIALITY / RISK

Regulatory

Contractual

Internal

Academic

Goodwill

Strategic

Compliance

OperationalFinancial

Reputational

High Priority

Low Priority

Higher priority closer to center

& intersects

Page 29: Navigating the BI Stack _

TEAM MEMBERS – BUSINESS CHAMPION

29

• Responsible for:

• Overall strategy & development of the BI Stack

• Facilitating organizational change related to BI

• Encouraging adoption of BI tools & methods

• Ideal Candidate:

• An experienced leader with deep insight into BI

• Background in analysis

• High-touch effective communicator

• Willing to break new ground

• Success Criteria:

• BI Stack effectiveness

• Innovative approaches leading to efficiency / accuracy / insight breakthroughs

• User adoption and reliance on BI product

Page 30: Navigating the BI Stack _

TEAM MEMBERS – IT CHAMPION

30

• Responsible for:

• Ensure implementation the technical aspects of the BI Stack to support business needs

• Owner of hardware / software

• Manages day to day operations

• Ideal Candidate:

• An experienced leader with deep IT skillsets

• Strong experience in data warehousing, ETL, and server resource management

• Success Criteria:

• “N 9’s” uptime

• Accurate uninterrupted data flows

• Able to accomplish business needs / requirements within specified SLAs

Page 31: Navigating the BI Stack _

TEAM MEMBERS – PROJECT MANAGER

31

• Responsible for:

• Owner of the project plan

• Facilitates forward project movement

• “keeping everyone honest”

• Ideal Candidate:

• Highly organized, skilled project manager

• Effective communicator / negotiator

• Forward thinking

• Success Criteria:

• Documented project plan with forecasts, metrics, and progress analysis

• Identify and address all SWOT aspects

• Demonstrable progress made towards strategic goals

Page 32: Navigating the BI Stack _

TEAM MEMBERS – BUSINESS ANALYST

32

• Responsible for:

• Interviewing stakeholders to understand business needs / rules

• Effective documentation of the BI projects

• Translating needs & rules into actionable development goals / methods

• Ideal Candidate:

• Some experience in programming / development

• Strong documentation skills

• Industry experience

• Success Criteria:

• Effective documentation

• Captured all relevant business needs & rules

• Effective communication

Page 33: Navigating the BI Stack _

TEAM MEMBERS – TRAINER

33

• Responsible for:

• Teaching the use of the BI tools

• Providing feedback to developers on refinements to tools to further enable adoption

• Ideal Candidate:– Able to communicate complex

concepts effectively– Patient and skilled communicator

– Skilled with BI tools

• Success Criteria:– Users report successful use of BI tools

following training– Effective feedback on tools provided to

developers to further enhance tools

Page 34: Navigating the BI Stack _

TEAM MEMBERS – DBA

34

• Responsible for:

• Owner of the core databases related to the BI Stack and manages server hardware

• Oversees ETL efforts

• Teaches advanced coding techniques to support effective use of tools

• Ideal Candidate:

• Highly skilled in DBMS platform

• Strong awareness of organizational needs and vision necessary to meet those needs

• Creative approaches to complex issues

• Success Criteria:

• Successful data migrations / promotions

• No integrity lost in data outside of expectations

• Highly available BI Stack

Page 35: Navigating the BI Stack _

TEAM MEMBERS – ARCHITECT / MODELER

35

• Responsible for:

• Development of the logical model of the BI stack

• Management of data governance efforts

• Owner of all BI Documentation

• Manages all semantic layers / components

• Ideal Candidate:

• Skilled in both conceptual design and practical implementation of data

• Clear understanding of business needs and desired outcomes

• Strong programming skillset in DBMS / ETL tools

• Success Criteria:

• Completion / maintenance of current data model documents

• Effective data governance (including I/O, security, recovery/destruction, and standards)

• Successful implementation & maintenance of data models that meet business needs

Page 36: Navigating the BI Stack _

TEAM MEMBERS – DATA ANALYST

36

• Responsible for:

• Development of analytic tools based on the BI logical model

• Providing insight to the organization with actionable data

• Identify trends / behaviors / opportunities

• Ideal Candidate:

• Deep DBMS understanding

• Organizational / business knowledge

• Highly analytical and effective communicator.

• Success Criteria:

• Development & maintenance of BI tools

• Accurate and effective representation of responses to business needs

• Clear communication and actionable guidance

Page 37: Navigating the BI Stack _

TEAM MEMBERS – QUALITY ASSURANCE / GOVERNANCE

37

• Responsible for:

• Ensuring the overall quality of data as it promotes through the BI Stack

• Executing against the data governance plan

• Act as SME on behalf of the internal / external customer (Ombudsman)

• Ideal Candidate:

• Highly detail oriented

• Skilled in both the data and the context of the business, rules, and needs

• Able to clarify rules / needs and infer the purpose of same

• Success Criteria:

• Elimination and prevention of data quality / integrity issues

• Data is acquired, used, and destroyed according to plan

• Insight is provided in accordance with business needsand within appropriate context

Page 38: Navigating the BI Stack _

TEAM MEMBERS –REPORT DEVELOPER

38

• Responsible for:

• Initial development or reporting solutions using data and methodologies created by the analyst

• Create effective visualizations of data to ‘paint a picture’

• Determine where on the “Reporting Value Matrix” a report exists

• Ideal Candidate:

• Skilled with BI tools selected by the organization

• Ability to translate analytic products into actionable metrics

• Capable of explaining metrics in a consumable way

• Success Criteria:

• Report volume developed / maintained

• Elimination of duplication

• Standardization of metrics or communication of variations

• Appropriate reporting methodologies

Page 39: Navigating the BI Stack _

TEAM MEMBERS – REPORT OPERATOR

39

• Responsible for:

• Automation or manual runs of established reporting packages

• Endpoint QA of finalized report

• Distribution of finalized reports

• Ideal Candidate:

• Detail oriented

• Interested in data analytics

• Intermediate skills with analytic tools

• Success Criteria:

• Reports delivered on time to “N 9’s” rate

• No obvious errors delivered to client

• Effective management of reporting stakeholders & recipients

Page 40: Navigating the BI Stack _

TEAM MEMBERS – USER (YOU!)

40

• Responsible for:

• Utilization of the products of the BI Stack

• Providing feedback on those products

• Communicating any changes to business rules, needs, assumptions, or strategy

• Ideal Candidate:

• Aware of the business needs, etc.

• Has some influence on the outcome of reports with respect to the Report Value Matrix

• Able to consume analytic or reporting outputs

• Success Criteria:

• Action taken using reporting products

• Can communicate value of analytic and reporting products

Page 41: Navigating the BI Stack _

THINGS TO REMEMBER

41

• “Crowd wisdom” values all voices- people and data, harmonizers and dissenters.

• The purpose of BI is Operational and Transformational Insight, NOT reporting.

• Business Intelligence is BOTH a set of tools and the process in which they are used.

• The “BI Stack” includes an EDW, but encompasses much more than EDW

• The BI environment is best viewed as a cycle rather than a goal.

• There are many “layers” to the BI Stack, each with a valuable part to play.

• Ultimately the art and value of the BI Stack comes from the creative and innovative relationships within the data.

• Reporting value is determined using the Report Value Matrix measures.

• Many players participate in the BI process, each bringing value to the tools.

Page 42: Navigating the BI Stack _

WHAT MORE CAN I SHARE?

42