navigating the bi stack _
TRANSCRIPT
Copyright 2014 Proprietary and Confidential
NAVIGATING THE BI STACK
DEVELOPING OPERATIONAL AND TRANSFORMATIONAL INSIGHT FROM THE WISDOM OF THE CROWD
CROWD WISDOM
2
• The “Wisdom of the Crowd” is the collective opinion of a group of individuals rather than that of a single expert. This collective wisdom results in more accurate information and better decisions than an individuals due to bias and idiosyncratic noise.
• When applied to an analytics context, we use each original data source as one or more “opinions” about an aspect of the business in addition to the stakeholders.
• Multiple datasets, brought together in one location with the express purpose of “harmonizing data”, adds to the collective wisdom of our crowd.
• The expectation is that while not all individuals will always agree, a ‘majority rule’ approach can be effective in decision making.
• Dissenting “opinions” have an important role to play as well, establishing checks and balances against propagated errors and organizational “groupthink”.
• Some examples of crowd wisdom are criminal juries and Google’s search engine.
This article uses material from the Wikipedia article Wisdom of the Crowd, which is released under the Commons Attribution-Share-Alike License 3.0
Business Intelligence is set of tools used to implement an approach to understanding one’s business, industry, and the best actions to take to maximize
value organizationally.
WHAT IS BI?
3
AN APPROACH
A method of analysis that compounds very large, complex datasets into relationship driven metrics that allow an organization to consume and take action across multiple systems.
Examples include data warehousing, semantic / statistical / predictive modeling, OLAP cubes, and big data universes.
A TOOLSET
A collection of software driven tools used to acquire, interpret, and communicate key business data
Examples include SQL, Business Objects, MicroStrategy, Crystal Reports, SSIS, SAS, Microsoft Excel, and many others.
WHAT IS A BI STACK THEN?
4
• The BI Stack is the logical process of using the tools and methods of BI to process data.
• When data moved through the BI Stack, it’s called “migrating” or “promoting”, used in the sense that the data has passed through the tools successfully and without errors tested in the previous layers.
• There are many routes through the BI stack, but in general data only moves in one direction. Think of it like a big filter organizing data as it moves through the stack.
• The stack has many disciplines and players involved, and each part requires a specific skillset.
• The end result of the process is simple:actionable data that provides greater insight to an organization.
WHAT IS AN EDW?
5
• An Enterprise Data Warehouse (EDW) is one part of the BI Stack. While it is a large part of the BI Stack (both physically and logically) it isn’t the end point for data.
• The EDW is very similar to a warehouse in the sense that it is a huge space to put objects for later use.
• Specifically it is the end point for the data in terms of data transformations. When data reaches the EDW for all intents and purposes it is as “clean” as it can be but may NOT be totally aligned with all business rules, standards, etc.
• A great deal of work still happens after the data warehouse. From a logical standpoint, the EDW occupies a place about 2/3 of the way through the BI stack.
WHY IS AN EDW INSUFFICIENT FOR ACTIONABLE ANALYSIS?
6
• An EDW is often perceived as the end point of a BI method, but it’s actually much closer to the beginning from an analytic standpoint.
• Generally once data arrives at the EDW, this is when analysts start to develop their work and interpretations. Most operational reporting will live at this level, but analysis continues on. This is what’s called Operational Insight, and while valuable really focuses on the questions “What am I doing, and what have I done?”
• Most aggregate metrics, analytic models, and semantic relationships cannot occur at this level without significant additional challenges.
IF NOT THE EDW, THEN WHAT?
7
• An EDW is the foundation of a highly performing analytic toolset. We must not think of data as something to “warehouse” and more of something to keep moving, or “pumping”
• This is the difference between a static BI Stack and an active BI Stack. One sees EDW as an endpoint, whereas the other sees the EDW as a data pump, constantly refining data to get greater value from it.
• All relational analytic tools are based, in part, on a robust EDW. Most often, they will have other aspects built above it though. These include semantic layers, OLAP cubes, data universes/marts/stores, sandboxes, and other types of tools.
• These tools are often what result in Transformational Insight, or answers to questions such as “What should I do next” or “What will my clients/suppliers/industry do in the future”. Other types of transformational questions are “What should I have done” and “Was my prediction accurate?”
DEFINING BI SUCCESS - SYSTEM
8
• Accessible• The EDW should be accessible by all team members that interact with data in a way that is consumable,
understandable, and simple to obtain.
• Adaptable• As the business evolves, the way we store, perceive, and consume data must evolve as well. Building
flexibility into the DNA of the data warehouse is fundamental to the overall success of the product.
• Quality• No dataset is perfect, and it is always necessary to validate, scrub, and qualify datasets regardless of
pedigree. A careful balance must be struck to eliminate as many flaws as reasonably possible while maintaining the integrity and culture of the original data. This includes standardization as well as alignment where appropriate.
• Secure• All care should be taken to protect our data as carefully as we would the patients we serve. Data should
only be released to those allowed to see it, and only so much data as is necessary to complete the task at hand. All data generated should have a designated life span and disposal process.
DEFINING BI SUCCESS - APPLICATION
9
• Self Service Enablement• Data is neither the exclusive domain of IT nor analysts. Data is necessary to support all aspects of the company and therefore all
team members should be able to access data necessary to their daily work independently.
• Platform Agnostic• The purpose of the data warehouse structure should be to align the data we produce and consume with the business needs rather
than the originating data source. Thus, the data should be divorced from the paradigm of the originating application insofar as is necessary to align with the organizational structure (and therefore business needs) of the company.
• Fully Integrated• Regardless of data source or purpose, our business revolves around four core datasets: patient, provider, payor, and
benchmarks. Each of these data are inextricably linked and therefore so should the data in the data warehouse. Data should be related in an efficient and accurate way that also allows for unique analytic approaches as necessary
• Advanced Analytic support• The end goal of a data warehouse is not to recreate original data sources or to automate simple tasks. It is to enable the
advanced toolsets that are only possible with a much larger perspective of our business than any single operational tool can provide. Enabling sandboxing, modeling, prediction, and other advanced capabilities that can then be integrated into all aspects of decision-making is the hallmark of an evolved data warehouse.
BUSINESS INTELLIGENCE IS A CYCLE, NOT A GOAL!
10
• The BI environment is best viewed as an organic cycle in which capabilities grow and evolve with the business
• The tools must adapt to the business rather than vice-versa
Interview
Plan
Prototype
Promote
Utilize
Monitor
LET’S BUILD A BI STACK! SOURCE DATA
11
Claims External Auths Fin / GL Benchmark Others
• At the bottom of the stack is the SOURCE DATA. This is also called the Original Data Source (External) or sometimes just ODS.
• This ODS is generally NOT used for reporting or analysis except when done from within the host application.
• This also becomes the validation “source of truth” until final approval is given later to move into a production space.
• There is one box for every dataset sent into the EDW here. So every application, external data source, etc.
• One important point to understand here is that every time data “moves”, we use a program called an “ETL Process”. ETL means Extract, Transform, and Load.
LET’S BUILD A BI STACK! DATA STAGING
12
Claims External Auths Fin / GL Benchmark Others
Stage
• In the STAGE layer, all of the data is unified into a single DBMS platform. A DBMS platform is essentially a big database application like Microsoft SQL, Oracle, Teradata, and many others.
• During this step, many different systems feed into the database. Everything from a simple text file to major mainframe systems are organized into a single common data format.
• No data correction takes place at this point. This data should be essentially a “mirror image” of the data in its original source.
• While ETL is happening here, we’re much more focused on the Extract and Load aspects at this point.
LET’S BUILD A BI STACK! LOAD LAYER
13
Claims External Auths Fin / GL Benchmark Others
Stage
Load DEV Load TEST / QA Load PROD
• In the LOAD LAYER, data begins it’s first set of transformations.
• Data begins to go through qualitative review, and the ETL process looks for data failures, changes, and other such issues.
• There are 3 data “environments” here. • DEV (Development)- a place to invent new ways to work with data• TEST / QA- a place to validate the proper function of DEV data• PROD (Production)- where the live data is loaded for promotion into
EDW.
LET’S BUILD A BI STACK! DATA / ODS LAYER
14
Claims External Auths Fin / GL Benchmark Others
Stage
Load DEV
Data / ODS DEV
Load TEST / QA
Data / ODS TEST / QA
Load PROD
Data / ODS PROD
• In the DATA or ODS layer, we see the results of the initial transformations to the data from LOAD.
• ODS in this case means Operational Data Store. This is the data that looks almost exactly the same as the original data, but has passed quality tests and is now stored consistent with most core data rules.
• While this is technically the beginning of the analytic area, data here is not yet related to other data. So, you won’t find an easy way to align a claim with an authorization for example.
• At this layer and lower, IT takes an ownership role while Analytics is responsible for clarifying business needs.
LET’S BUILD A BI STACK! EDW LAYER
15
Claims External Auths Fin / GL Benchmark Others
Stage
Load DEV
Data / ODS DEV
EDW DEV
Load TEST / QA
Data / ODS TEST / QA
EDW TEST / QA
Load PROD
Data / ODS PROD
EDW PROD
• The EDW layer is where the final core data transformations are made, as well as all “standardized” relationships between datasets.
• At this point, the data is scrubbed for accuracy, completeness, and alignment with organizational standards but it still may be very “raw”.
• Few metrics or aggregations, if any, will appear here. This is still a purely data-driven dataset.
• Data here should be generally easy to validate against the ODS data.
LET’S BUILD A BI STACK! SEMANTIC LAYER
16
Claims External Auths Fin / GL Benchmark Others
Stage
Load DEV
Data / ODS DEV
EDW DEV
Load TEST / QA
Data / ODS TEST / QA
EDW TEST / QA
Load PROD
Data / ODS PROD
EDW PROD
Semantic OLAP Sandbox Prod Models BI Universe
• This is where the analytic teams generally live. Most metrics, aggregations, and analytic tools will use this layer for day to day work.
• As necessary, analysts may look deeper into the stack to obtain data not available at this level.
• At this point IT tends to act as steward, while Analytics takes an ownership role.
LET’S BUILD A BI STACK! MODEL LAYER
17
Claims External Auths Fin / GL Benchmark Others
Stage
Load DEV
Data / ODS DEV
EDW DEV
Load TEST / QA
Data / ODS TEST / QA
EDW TEST / QA
Load PROD
Data / ODS PROD
EDW PROD
Semantic OLAP Sandbox Prod Models BI Universe
Analytic Modeling
• In this modeling layer, the Advanced Analytics teams will build various data models including predictive, statistic, behavioral, market, etc.
• The outcomes of these models can be used as not only an end-point analysis, but can also continue to feed out the top of the BI Stack and influence the data in the LOAD layer as well!
• These tools are often developed using highly cleansed datasets and tend to have a much narrower analytic focus, requiring careful interpretation.
LET’S BUILD A BI STACK!
18
Claims External Auths Fin / GL Benchmark Others
Stage
Load DEV
Data / ODS DEV
EDW DEV
Load TEST / QA
Data / ODS TEST / QA
EDW TEST / QA
Load PROD
Data / ODS PROD
EDW PROD
Semantic OLAP Sandbox Prod Models BI Universe
Analytic Modeling
Outbound Data Pumps / ETL
• In this layer, data is pushed out of the stack. This data generally includes• Detailed analytic results routed to other applications internal to the
organization• Data destined to be transformed or to transform data as it migrates
into the BI Stack STAGE layer• Externally focused datasets that have been validated for automated
release
• Most often this is an automated solution
• These outputs are generally NOT considered analysis datasets, although often are used as such
LET’S BUILD A BI STACK! PRESENTATION LAYER
19
Claims External Auths Fin / GL Benchmark Others
Stage
Load DEV
Data / ODS DEV
EDW DEV
Load TEST / QA
Data / ODS TEST / QA
EDW TEST / QA
Load PROD
Data / ODS PROD
EDW PROD
Semantic OLAP Sandbox Prod Models BI Universe
Presentation
Analytic Modeling
Outbound Data Pumps / ETL• This is the final layer of the BI Stack, and it consists primarily of
aggregations, metrics, and visualizations.
• Report developers and operators tend to work mostly in this layer
• This layer will also contain unattended analyses such as automated, subscribed reports, dashboards, and analytics-on-rails datasets
• Most leadership should be comfortable working in this layer. It often has drag and drop interfaces, highly cleansed data, and well documented standards
• It is uncommon for detail level data to be accessible in this layer without special permissions.
LET’S BUILD A BI STACK! THE FINAL MODEL
20
Claims External Auths Fin / GL Benchmark Others
Stage
Load DEV
Data / ODS DEV
EDW DEV
Load TEST / QA
Data / ODS TEST / QA
EDW TEST / QA
Load PROD
Data / ODS PROD
EDW PROD
Semantic OLAP Sandbox Prod Models BI Universe
Presentation
Analytic Modeling
Outbound Data Pumps / ETL
LET’S BUILD A BI STACK! MOVEMENT
21
Claims External Auths Fin / GL Benchmark Others
Stage
Load DEV
Data / ODS DEV
EDW DEV
Load TEST / QA
Data / ODS TEST / QA
EDW TEST / QA
Load PROD
Data / ODS PROD
EDW PROD
Semantic OLAP Sandbox Prod Models BI Universe
Presentation
Analytic Modeling
Outbound Data Pumps / ETL
Movement
Generally data only moves vertically through the BI Stack.
However, in more advanced implementations the results of analysesand models are routed back into the STAGE layer to influence the data as it promotes upward.
This is another reason an effective BI Stack is called a “Data Pump”.
Each time data “moves” through the BI stack, a program must be written to make that happen. These
programs are called ETL processes.
REMOVING THE DEVIL FROM THE DETAILS
22
• Typically, the higher (vertically) you move in the stack, the cleaner the data becomes.
• In addition, data becomes simpler to use because more business rules are applied the higher you go.
• The tradeoff for the greater simplicity and cleaner data is at the sacrifice of ODS alignment and detail availability.
• The data is still available through an effective “Data Lineage” tools available to analysts that help explain how the rules influenced the data.
Simple Data
Complex Data
Clean values
ODS Flaws
Business Users
Adv. Analytics TeamIT Oversight
IT EDW TeamBusiness Analytics
IT Application Team
MAJOR DOMAINS
23
Claims External Auths Fin / GL Benchmark Others
Stage
Load DEV
Data / ODS DEV
EDW DEV
Load TEST / QA
Data / ODS TEST / QA
EDW TEST / QA
Load PROD
Data / ODS PROD
EDW PROD
Semantic OLAP Sandbox Prod Models BI Universe
Presentation
Analytic Modeling
Outbound Data Pumps / ETL
Situationally these roles may change, but in general the areas of concern are fairly well aligned with the level within the BI Stack.
SO IT’S JUST A BUNCH OF DATABASES?
24
• The magic of the BI Stack isn’t in the value of the data held therein, but in the relationships between differing datasets.
• Ideally, because these datasets all essentially describe the same thing – our business, patients, and industry – they should all be able to weave into each other.
• The wisdom of crowds is revealed in applying business questions to the many layers of related data. Additionally, the dissenting opinions within the data also provide insight into flaws, misunderstandings, and opportunities.
• It’s in the relationships between data where the true art of analysis becomes visible, and this has a value that far exceeds the intrinsic measure.
• A skilled EDW development team can make a series of databases drive transformational change not otherwise possible.
BI MATURITY – ROI PERSPECTIVE
25
http://www.eurim.org.uk/activities/ig/voi/03-01-06_Executive_Series_Assessing_Your_BI_Maturity.pdf
BI MATURITY – UTILIZATION PERSPECTIVE
26
http://www.eurim.org.uk/activities/ig/voi/03-01-06_Executive_Series_Assessing_Your_BI_Maturity.pdf
REPORTING VALUE MATRIX
27
INSIGHT
ACTION
STRATEGYUNDERSTANDING
To determine the VALUE of an analysis, we must establish the
insight it provides. This is measured using 3 metrics driven by simple
questions:
ACTIONWhat change will you make using this
information?
STRATEGYHow will you change your approach to the
business with this report?
UNDERSTANDINGWhat deeper knowledge will you gain from
this data?
PRIORITIZATION HIERARCHY
28
Prioritization generally falls along two scales :
INTENT AND MATERIALITY / RISK
Regulatory
Contractual
Internal
Academic
Goodwill
Strategic
Compliance
OperationalFinancial
Reputational
High Priority
Low Priority
Higher priority closer to center
& intersects
TEAM MEMBERS – BUSINESS CHAMPION
29
• Responsible for:
• Overall strategy & development of the BI Stack
• Facilitating organizational change related to BI
• Encouraging adoption of BI tools & methods
• Ideal Candidate:
• An experienced leader with deep insight into BI
• Background in analysis
• High-touch effective communicator
• Willing to break new ground
• Success Criteria:
• BI Stack effectiveness
• Innovative approaches leading to efficiency / accuracy / insight breakthroughs
• User adoption and reliance on BI product
TEAM MEMBERS – IT CHAMPION
30
• Responsible for:
• Ensure implementation the technical aspects of the BI Stack to support business needs
• Owner of hardware / software
• Manages day to day operations
• Ideal Candidate:
• An experienced leader with deep IT skillsets
• Strong experience in data warehousing, ETL, and server resource management
• Success Criteria:
• “N 9’s” uptime
• Accurate uninterrupted data flows
• Able to accomplish business needs / requirements within specified SLAs
TEAM MEMBERS – PROJECT MANAGER
31
• Responsible for:
• Owner of the project plan
• Facilitates forward project movement
• “keeping everyone honest”
• Ideal Candidate:
• Highly organized, skilled project manager
• Effective communicator / negotiator
• Forward thinking
• Success Criteria:
• Documented project plan with forecasts, metrics, and progress analysis
• Identify and address all SWOT aspects
• Demonstrable progress made towards strategic goals
TEAM MEMBERS – BUSINESS ANALYST
32
• Responsible for:
• Interviewing stakeholders to understand business needs / rules
• Effective documentation of the BI projects
• Translating needs & rules into actionable development goals / methods
• Ideal Candidate:
• Some experience in programming / development
• Strong documentation skills
• Industry experience
• Success Criteria:
• Effective documentation
• Captured all relevant business needs & rules
• Effective communication
TEAM MEMBERS – TRAINER
33
• Responsible for:
• Teaching the use of the BI tools
• Providing feedback to developers on refinements to tools to further enable adoption
• Ideal Candidate:– Able to communicate complex
concepts effectively– Patient and skilled communicator
– Skilled with BI tools
• Success Criteria:– Users report successful use of BI tools
following training– Effective feedback on tools provided to
developers to further enhance tools
TEAM MEMBERS – DBA
34
• Responsible for:
• Owner of the core databases related to the BI Stack and manages server hardware
• Oversees ETL efforts
• Teaches advanced coding techniques to support effective use of tools
• Ideal Candidate:
• Highly skilled in DBMS platform
• Strong awareness of organizational needs and vision necessary to meet those needs
• Creative approaches to complex issues
• Success Criteria:
• Successful data migrations / promotions
• No integrity lost in data outside of expectations
• Highly available BI Stack
TEAM MEMBERS – ARCHITECT / MODELER
35
• Responsible for:
• Development of the logical model of the BI stack
• Management of data governance efforts
• Owner of all BI Documentation
• Manages all semantic layers / components
• Ideal Candidate:
• Skilled in both conceptual design and practical implementation of data
• Clear understanding of business needs and desired outcomes
• Strong programming skillset in DBMS / ETL tools
• Success Criteria:
• Completion / maintenance of current data model documents
• Effective data governance (including I/O, security, recovery/destruction, and standards)
• Successful implementation & maintenance of data models that meet business needs
TEAM MEMBERS – DATA ANALYST
36
• Responsible for:
• Development of analytic tools based on the BI logical model
• Providing insight to the organization with actionable data
• Identify trends / behaviors / opportunities
• Ideal Candidate:
• Deep DBMS understanding
• Organizational / business knowledge
• Highly analytical and effective communicator.
• Success Criteria:
• Development & maintenance of BI tools
• Accurate and effective representation of responses to business needs
• Clear communication and actionable guidance
TEAM MEMBERS – QUALITY ASSURANCE / GOVERNANCE
37
• Responsible for:
• Ensuring the overall quality of data as it promotes through the BI Stack
• Executing against the data governance plan
• Act as SME on behalf of the internal / external customer (Ombudsman)
• Ideal Candidate:
• Highly detail oriented
• Skilled in both the data and the context of the business, rules, and needs
• Able to clarify rules / needs and infer the purpose of same
• Success Criteria:
• Elimination and prevention of data quality / integrity issues
• Data is acquired, used, and destroyed according to plan
• Insight is provided in accordance with business needsand within appropriate context
TEAM MEMBERS –REPORT DEVELOPER
38
• Responsible for:
• Initial development or reporting solutions using data and methodologies created by the analyst
• Create effective visualizations of data to ‘paint a picture’
• Determine where on the “Reporting Value Matrix” a report exists
• Ideal Candidate:
• Skilled with BI tools selected by the organization
• Ability to translate analytic products into actionable metrics
• Capable of explaining metrics in a consumable way
• Success Criteria:
• Report volume developed / maintained
• Elimination of duplication
• Standardization of metrics or communication of variations
• Appropriate reporting methodologies
TEAM MEMBERS – REPORT OPERATOR
39
• Responsible for:
• Automation or manual runs of established reporting packages
• Endpoint QA of finalized report
• Distribution of finalized reports
• Ideal Candidate:
• Detail oriented
• Interested in data analytics
• Intermediate skills with analytic tools
• Success Criteria:
• Reports delivered on time to “N 9’s” rate
• No obvious errors delivered to client
• Effective management of reporting stakeholders & recipients
TEAM MEMBERS – USER (YOU!)
40
• Responsible for:
• Utilization of the products of the BI Stack
• Providing feedback on those products
• Communicating any changes to business rules, needs, assumptions, or strategy
• Ideal Candidate:
• Aware of the business needs, etc.
• Has some influence on the outcome of reports with respect to the Report Value Matrix
• Able to consume analytic or reporting outputs
• Success Criteria:
• Action taken using reporting products
• Can communicate value of analytic and reporting products
THINGS TO REMEMBER
41
• “Crowd wisdom” values all voices- people and data, harmonizers and dissenters.
• The purpose of BI is Operational and Transformational Insight, NOT reporting.
• Business Intelligence is BOTH a set of tools and the process in which they are used.
• The “BI Stack” includes an EDW, but encompasses much more than EDW
• The BI environment is best viewed as a cycle rather than a goal.
• There are many “layers” to the BI Stack, each with a valuable part to play.
• Ultimately the art and value of the BI Stack comes from the creative and innovative relationships within the data.
• Reporting value is determined using the Report Value Matrix measures.
• Many players participate in the BI process, each bringing value to the tools.
WHAT MORE CAN I SHARE?
42