bidw concepts
TRANSCRIPT
-
8/8/2019 BIDW Concepts
1/56
-
8/8/2019 BIDW Concepts
2/56
Agenda
Data warehousing
overview
Data warehouse Vs OLTP
Data warehouse Vs DataMart
-
8/8/2019 BIDW Concepts
3/56
integration * intelligence * insight
What is BI? Business intelligence (BI) is a broad category of
application programs and technologies for gathering,storing, analyzing, and providing access to data tohelp enterprise users make better businessdecisions.
BI applications include the activities of decisionsupport, query and reporting, online analyticalprocessing (OLAP), statistical analysis, forecasting,
and data mining.
Examples : Business Objects :www.businessobjects.com
3
-
8/8/2019 BIDW Concepts
4/56
integration * intelligence * insight
BI- Nutshell
4
RawData
-
8/8/2019 BIDW Concepts
5/56
Which are ourlowest/highest margin
customers ?
Who are my customersand what products
are they buying?
Which customers
are most likely to goto the competition ?
What impact willnew products/services
have on revenue
and margins?
What product prom-
-otions have the biggestimpact on revenue?
What is the mosteffective distribution
channel?
A producer wants to know.
-
8/8/2019 BIDW Concepts
6/56
Data, Data everywhereyet ...
I cant find the data I need
data is scattered over thenetwork
many versions, subtledifferences
I cant get the data I need
need an expert to get the data
I cant understand the data Ifound
available data poorly documented
I cant use the data I found
results are unexpected
data needs to be transformedfrom one form to other
-
8/8/2019 BIDW Concepts
7/56
What is a Data Warehouse?
A single, complete andconsistent store of dataobtained from a variety
of different sourcesmade available to endusers in a what theycan understand and use
in a business context.
[Barry Devlin]
-
8/8/2019 BIDW Concepts
8/56
What are the users saying...
Data should be integratedacross the enterprise
Summary data has a real
value to the organization
Historical data holds thekey to understanding data
over timeWhat-if capabilities are
required
-
8/8/2019 BIDW Concepts
9/56
What is Data Warehousing?
A process of
transforming data intoinformation andmaking it available tousers in a timelyenough manner to
make a difference
Data
Information
-
8/8/2019 BIDW Concepts
10/56
Evolution
60s: Batch reports
hard to find and analyze information
inflexible and expensive, reprogram every newrequest
70s: Terminal-based DSS and EIS (executive
information systems)still inflexible, not integrated with desktop tools
80s: Desktop data access and analysis tools
query tools, spreadsheets, GUIs
easier to use, but only access operational databases 90s till now: Data warehousing with
integrated OLAP engines and tools, real timeDW
-
8/8/2019 BIDW Concepts
11/56
Data Warehouse
A data warehouse is a
subject-oriented
integrated
time-varying
non-volatile
Accessible
collection of data that is used primarily in
organizational decision making.
-- Bill Inmon, Building the Data Warehouse 1996
-
8/8/2019 BIDW Concepts
12/56
Explorers, Farmers and Tourists
Explorers: Seek out the unknown andpreviously unsuspected rewards hiding inthe detailed data
Farmers: Harvest informationfrom known access paths
Tourists: Browse informationharvested by farmers
-
8/8/2019 BIDW Concepts
13/56
Data Warehouse Architecture
Data Warehouse
Engine
Optimized Loader
ExtractionCleansing
Analyze
Query
Metadata Repository
RelationalDatabases
LegacyData
Purchased
Data
ERPSystems
-
8/8/2019 BIDW Concepts
14/56
Data Mining works with WarehouseData
Data Warehousingprovides the Enterprisewith a memory
Data Mining providesthe Enterprise withintelligence
-
8/8/2019 BIDW Concepts
15/56
What makes data mining possible?
Advances in the following areas aremaking data mining deployable:
data warehousing
better and more data (i.e., operational,behavioral, and demographic)
the emergence of easily deployed data
mining tools andthe advent of new data mining
techniques. -- Gartner Group
-
8/8/2019 BIDW Concepts
16/56
Why Separate Data Warehouse?
Performance
Operational database designed & tuned for known transactions &workloads.
Complex OLAP queries would degrade performance. for optransactions.
Special data organization, access & implementation methodsneeded for multidimensional views & queries.
Function
Missing data: Decision support requires historical data, which
Operational database do not typically maintain.
Data consolidation: Decision support requires consolidation(aggregation, summarization) of data from many heterogeneoussources: operational databases, external sources.
Data quality: Different sources typically use inconsistent datare resentations codes and formats which have to be reconciled.
-
8/8/2019 BIDW Concepts
17/56
Benefits of a Data Warehouse
Reliable reporting
Rapid access to data
Integrated dataFlexible presentation of data
Better decision making
-
8/8/2019 BIDW Concepts
18/56
So, whats different?
-
8/8/2019 BIDW Concepts
19/56
Application-Orientation vs. Subject-Orientation
Application-Orientation
Operational
Database
LoansCreditCard
Trust
Savings
Subject-Orientation
Data
Warehouse
Customer
Vendor
Product
Activity
-
8/8/2019 BIDW Concepts
20/56
OLTP vs Data Warehouse
OLTP
Application Oriented
Used to run business
Detailed data
Current up to date
Isolated Data
Repetitive access
Clerical User
Warehouse (DSS)
Subject Oriented
Used to analyze business
Summarized and refined
Snapshot data
Integrated Data
Ad-hoc access
Knowledge User
(Manager)
-
8/8/2019 BIDW Concepts
21/56
OLTP vs Data Warehouse
OLTP
Performance Sensitive
Few Records accessed at
a time (tens)
Read/Update Access
No data redundancy
Database Size 100MB-100 GB
Thousands of users
Data Warehouse
Performance relaxed
Large volumes accessed
at a time(millions)Mostly Read (Batch
Update)
Redundancy present
Database Size
100 GB - few terabytesHundreds of users
-
8/8/2019 BIDW Concepts
22/56
To summarize ...
OLTP Systems areused to runabusiness
The DataWarehouse helpsto optimize thebusiness
-
8/8/2019 BIDW Concepts
23/56
Why Now?
Data is being produced
ERP provides clean data
The computing power is available
The computing power is affordable
The competitive pressures are strong
Commercial products are available
-
8/8/2019 BIDW Concepts
24/56
Data Warehouses:Architecture, Design & Construction
DW Architecture
Loading, refreshing
Structuring/Modeling
DWs and Data Marts
-
8/8/2019 BIDW Concepts
25/56
Stages in Data Warehousing Life Cycle
-
8/8/2019 BIDW Concepts
26/56
Data Warehouse Architectures
Generic Two-Level Architecture
Independent Data Mart
Dependent Data Mart andOperational Data Store
All involve some form ofextraction,transformation and loading (ETL)
-
8/8/2019 BIDW Concepts
27/56
Generic two-level architecture
E
T
L
One,company-
wide
warehouse
Periodic extraction data is not completely current in warehouse
Independent Data Mart
-
8/8/2019 BIDW Concepts
28/56
Independent Data MartData marts:Mini-warehouses, limited in scope
E
T
L
Separate ETL for each
independent data mart
Data access complexity
due tomultiple data marts
Dependent data mart with operational data store
-
8/8/2019 BIDW Concepts
29/56
Dependentdata mart with operational data store
E
T
L
Single ETL for
enterprise data warehouse
(EDW)
Simpler data access
ODS provides option for
obtainingcurrent data
Dependent data marts
loaded from EDW
-
8/8/2019 BIDW Concepts
30/56
The ETL Process
Capture
Scrub or data cleansing
Transform
Load
ETL = Extract, transform, and load
-
8/8/2019 BIDW Concepts
31/56
Steps in data reconciliation
Static extract = capturing a
snapshot of the source data at
a point in time
Incremental extract =
capturing changes that have
occurred since the last static
extract
Capture = extractobtaining a snapshot
of a chosen subset of the source data for
loading into the data warehouse
-
8/8/2019 BIDW Concepts
32/56
Steps in data reconciliation (continued)
Scrub = cleanseuses pattern
recognition and AI techniques to
upgrade data quality
Fixing errors: misspellings,erroneous dates, incorrect field usage,
mismatched addresses, missing data,
duplicate data, inconsistencies
Also: decoding, reformatting, timestamping, conversion, key generation,
merging, error detection/logging,
locating missing data
-
8/8/2019 BIDW Concepts
33/56
Steps in data reconciliation (continued)
Transform = convert data from format
of operational system to format of data
warehouse
Record-level:Selectiondata partitioning
Joiningdata combining
Aggregationdata summarization
Field-level:single-fieldfrom one field to one field
multi-fieldfrom many fields to one, or
one field to many
-
8/8/2019 BIDW Concepts
34/56
Steps in data reconciliation (continued)
Load/Index= place transformed data
into the warehouse and create indexes
Refresh mode: bulk rewriting oftarget data at periodic intervals
Update mode: only changes insource data are written to data
warehouse
-
8/8/2019 BIDW Concepts
35/56
Data Warehouse vs. Data Marts
What comes first ?
-
8/8/2019 BIDW Concepts
36/56
Data Mart
Data mart is:
A functional segmentof an enterpriserestricted for purposes of security, locality,
performance, or business necessity usingmodeling and information deliverytechniques identical to data warehousing.
-
8/8/2019 BIDW Concepts
37/56
Data Mart
Why build a data mart?
Allows an organization to visualize the large but focuson the small and attainable.
Provides a platform for rapid delivery of an operationalsystem.
Minimizes risk.
A corporate warehouse can be constructed from theunion of the enterprise data marts.
-
8/8/2019 BIDW Concepts
38/56
Data Mart- Approach
Physical data warehouse (physical)
Data warehouse --> data marts
Data marts --> data warehouse
Parallel data warehouse and data marts
T d
-
8/8/2019 BIDW Concepts
39/56
Top-down
SOURCE DATA
ExternalData
Operational Data
Staging Area
Data Warehouse Data Marts
Physical Data Warehouse:Data Warehouse --> Data Marts
B tt h
-
8/8/2019 BIDW Concepts
40/56
Bottom-up approach
SOURCE DATA
ExternalData
Operational Data
Staging Area
Data Warehouse
Data Marts
Physical Data Warehouse:Data Marts --> Data Warehouse
-
8/8/2019 BIDW Concepts
41/56
Hybrid
SOURCE DATA
External
Data
Operational Data
Staging Area
Data Warehouse
Data Marts
Physical Data Warehouse:Parallel Data Warehouse & Data Marts
-
8/8/2019 BIDW Concepts
42/56
42
Schema Design
Database organizationmust look like business
must be recognizable by business user
approachable by business userMust be simple
Schema Types
Star SchemaFact Constellation Schema
Snowflake schema
C l M d l f
-
8/8/2019 BIDW Concepts
43/56
Conceptual Modeling ofData Warehouses
Modeling data warehouses: dimensions &
measures
Star schema: A fact table in the middle connected to a
set of dimension tablesSnowflake schema: A refinement of star schema where
some dimensional hierarchy is normalized into a set of
smaller dimension tables, forming a shape similar to
snowflakeFact constellations: Multiple fact tables share dimension
tables, viewed as a collection of stars, therefore called
galaxy schema or fact constellation
-
8/8/2019 BIDW Concepts
44/56
44
Dimension Tables
Dimension tablesDefine business in terms already
familiar to users
Wide rows with lots of descriptive textSmall tables (about a million rows)
Joined to fact table by a foreign key
heavily indexed
typical dimensionstime periods, geographic region (markets,
cities), products, customers, salesperson,etc.
-
8/8/2019 BIDW Concepts
45/56
45
Fact Table
Central table
mostly raw numeric items
narrow rows, a few columns at most
large number of rows (millions to abillion)
Access via dimensions
-
8/8/2019 BIDW Concepts
46/56
Example of Star Schema
time_key
day
day_of_the_week
month
quarter
year
time
location_key
street
city
province_or_street
country
location
Sales Fact Table
time_key
item_key
branch_key
location_key
units_solddollars_sold
avg_sales
Measures
item_key
item_name
brand
type
supplier_type
item
branch_key
branch_namebranch_type
branch
-
8/8/2019 BIDW Concepts
47/56
Example of Snowflake Schema
time_key
day
day_of_the_week
month
quarter
year
time
location_key
street
city_key
location
Sales Fact Table
time_key
item_key
branch_key
location_key
units_sold
dollars_sold
avg_sales
Measures
item_key
item_name
brand
type
supplier_key
item
branch_key
branch_namebranch_type
branch
supplier_key
supplier_type
supplier
city_key
city
province_or_stree
country
city
-
8/8/2019 BIDW Concepts
48/56
Example of Fact Constellation
time_key
day
day_of_the_week
month
quarter
year
time
location_key
streetcity
province_or_street
country
location
Sales Fact Table
time_key
item_key
branch_key
location_key
units_sold
dollars_sold
avg_sales
Measures
item_key
item_name
brand
type
supplier_type
item
branch_key
branch_name
branch_type
branch
Shipping Fact Table
time_key
item_key
shipper_key
from_location
to_location
dollars_cost
units_shipped
shipper_key
shipper_name
location_keyshipper_type
shipper
-
8/8/2019 BIDW Concepts
49/56
Dimensional model
Visualise a dimensional model as a CUBE (hypercubebecause dimensions can be more than 3 in number)
Operations for OLAP
Drill Down :Higher level of detail
Roll Up: summarized level of data
(The navigation path is determined by hierarchies withindimensions.)
Slice: cuts through the cube.Users can focus on specificperspectives
Dice: rotates the cube to another perspective (change the
dimension)
D ill d R ll
-
8/8/2019 BIDW Concepts
50/56
Drill down . Roll up
Slice and Dice
-
8/8/2019 BIDW Concepts
51/56
Slice and Dice
-
8/8/2019 BIDW Concepts
52/56
Metadata Repository
Administrative metadata
source databases and their contents
gateway descriptions
warehouse schema, view & derived data definitions
dimensions, hierarchies
pre-defined queries and reports
data mart locations and contents
data partitions
data extraction, cleansing, transformation rules,defaults
data refresh and purging rules
user profiles, user groups
security: user authorization, access control
-
8/8/2019 BIDW Concepts
53/56
Metdata Repository .. 2
Business data
business terms and definitions
ownership of data
charging policies
operational metadata
data lineage: history of migrated data and
sequence of transformations appliedcurrency of data: active, archived, purged
monitoring information: warehouse usagestatistics, error reports, audit trails.
The BI/DW Lifecycle
-
8/8/2019 BIDW Concepts
54/56
The BI/DW Lifecycle
Source:
http://www.atre.com/navigator/#3
The BI/DW Lifecycle
http://www.atre.com/navigator/http://www.atre.com/navigator/ -
8/8/2019 BIDW Concepts
55/56
The BI/DW Lifecycle
Source: http://www.atre.com
Popular BI/DW Suites & Tools
http://www.atre.com/http://www.atre.com/ -
8/8/2019 BIDW Concepts
56/56
Popular BI/DW Suites & Tools Oracle
LDMs & Database Oracle Warehouse Builder Oracle Discoverer & Oracle Reporting BI Beans & JOLAP API
Microsoft Database SQL Server Analysis Services SQL Server Reporting Services SQL Server Integration Services
Teradata
Redbrick
Hyperion Essbase
Oracle Express Server
Informatica
Ab initio
Any Database SQL Language or any other
Programming Language
Cognos BI Suite
BusinessObjects & Crystal
Microstrategy
Actuate
Hyperion/Brio (Acquired byHyperion)
SAP BW
Peoplesoft EPM
Embarcadero Suite
Erwin
Cognos PerformanceApps
Planning &Budgeting
Full Suites
Reporting
Tools
ETL Tools
Databases Specialized
Tools
IBM Logical Data Model & IBM DB2 Database
DB2 Cube Views
ETL Ascential DataStage
DB2 Alphabox
SAS 9the BI Platform
Logical Data Model & SAS Database SAS ETL
BI and Reporting
SAS Data Mining