a data warehousing and business intelligence tutorial · a data warehousing and business...
TRANSCRIPT
![Page 1: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/1.jpg)
Starring Sakila
A Data Warehousing and Business Intelligence
Tutorial
![Page 2: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/2.jpg)
Starring Sakila
Welcome!
Matt CastersChief Data Integration, Pentahohttp://www.ibridge.be/ @mattcasters
Roland BoumanSoftware Engineer, Pentahohttp://rpbouman.blogspot.com/@rolandbouman
![Page 3: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/3.jpg)
Pentaho
● Commercial Open Source Business Intelligence– Full BI suite since 2005
● Projects: Kettle (DI & ETL), Jfree (Reporting), Mondrian (OLAP), Weka (Data Mining)
● Community: CDF (Dashboarding), Saiku (OLAP)
● Recent: Focus on “Big Data”, esp. Hadoop● http://www.pentaho.com● http://sourceforge.net/projects/pentaho/
![Page 4: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/4.jpg)
Agenda
● Business Intelligence● Data Warehousing● Anatomy of a Data Warehouse● Physical Implementation● Sakila – a Star is Born● Filling the Data Warehouse● Presenting the Data - BI Applications
![Page 5: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/5.jpg)
Starring Sakila
Part I:
Business Intelligence
![Page 6: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/6.jpg)
Business Intelligence
● Skills, technologies, applications and practices to acquire a better understanding of the commercial context of your business
● Turning data into information useful for business users
– Management Information– Decision Support
![Page 7: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/7.jpg)
Business Intelligence Scope
Operational
Strategic
Tactical
Customers Partners Employees
Analysts
Executives
Managers
days, weeks:
“Who's available for tomorrow's shift”
weeks, months:
“In what region should we open a new store?”
months, years:
“Should we become an ap-pliance vendor instead of delivering software solu-tions”
Reporting
OLAP/Analysis
Data mining
![Page 8: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/8.jpg)
Functional Parts of a Business Intelligence Solution
● Front end Applications:
– Reports– Charts and Graphs– OLAP Pivot tables– Data Mining– Dashboards
● Back end Infrastructure
– Data Integration– Data Warehouse– Data Mart– Metadata– (ROLAP) Cube
![Page 9: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/9.jpg)
High Level BI Architecture
ERP
Staging Area
Enterprise Data Warehouse
Meta Data
Extract Transform Load Present
Back-end Front-end
Datamarts
Sources
CRM
External Data
OLAP/Analysis
Reporting
Charts /Graphs
Dashboards
Data Mining
Operational Datastore
![Page 10: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/10.jpg)
Starring Sakila
Part II:
Data Warehousing
![Page 11: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/11.jpg)
What is a Data Warehouse?
● A database designed to support Business Intelligence Applications
● Different requirements as compared to Operational Applications
● Analytic Database Systems (ADBMS)– MySQL: Infobright, InfiniDB
– LucidDB, MonetDB
![Page 12: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/12.jpg)
What is a Data Warehouse?
● Ultimately, it's just a Relational Database– Tables, Columns, ...
● Designed for Business Intelligence applications– Ease of use
– Performance
● Data from various source systems– Integration, Standardization, Data cleaning
● Add and maintain history– Corporate memory
![Page 13: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/13.jpg)
What is a Data Warehouse?
● A database designed to support BI applications● BI applications (OLAP) differ from Operational
applications (OLTP)– OLTP: Online Transaction Processing
– OLAP: Online Analytical Processing
● Differences:– Applications, Data Processing, Data Model
![Page 14: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/14.jpg)
OLTP vs OLAP:Application Characterization
● OLTP– Operational
– 'Always' on
– All kinds of users
– Many users
– Directly supports business process
– Keep a Record of Current status
● OLAP– Tactical, Strategic
– Periodically Available
– Managers, Directors
– Few(er) users
– Redesign Business Process
– Decision support, long-term planning
– Maintains a history
![Page 15: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/15.jpg)
OLTP vs OLAP: Data Processing
● OLTP– Transactions
– Subject Oriented
– Add, Modify, Remove single rows
– Human data entry
– Queries for small sets of rows with all their details
– Standard queries
● OLAP– Groups
– Aspect Oriented
– Bulk load, rarely modify, never remove
– Automated ETL jobs
– Scan large sets to return aggregates over arbitrary groups
– Ad-hoc queries
![Page 16: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/16.jpg)
OLTP vs OLAP:Data Model
● OLTP– Entity-Relationship
model
– Entities, Attributes, Relationships
– Foreign key constraints
– Indexes to increase performance
– Normalized to 3NF or BCNF
● OLAP– Dimensional
model
– Facts, Dimensions, Hierarchies
– Ref. integrity ensured in loading process
– Scans on Fact table obliterates indexes
– Denormalized Dimensions (<= 1NF)
![Page 17: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/17.jpg)
High Level BI Architecture
ERP
Staging Area
Enterprise Data Warehouse
Meta Data
Extract Transform Load Present
Back-end Front-end
Datamarts
Sources
CRM
External Data
OLAP/Analysis
Reporting
Charts /Graphs
Dashboards
Data Mining
Operational Datastore
![Page 18: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/18.jpg)
Starring Sakila
Part III:
Dimensional Model
![Page 19: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/19.jpg)
What is the Dimensional Model?
● An aspect-oriented logical data model optimized for querying and data presentation
● Divides data in two kinds:
– Facts– Dimensions
![Page 20: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/20.jpg)
The Dimensional Model
● Facts
– Measures/Metrics of a Business Process– Examples: Cost, Units Sold, Profit
● Dimensions
– Context of Business Process– Who? What? Where? When? Why?– Navigate Facts: Selection, Rollup, Drilldown– Provide and maintain history
![Page 21: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/21.jpg)
Dimensional Data Presentation
Date Dimension 2008 Q4
Location Dimension
All Months
October November December
All locations $ 3850 $ 1000 $ 1350 $ 1500
America All America
$ 2050 $ 500 $ 750 $ 800
North $ 1275 $ 300 $ 500 $ 475
South $ 775 $ 200 $ 250 $ 325
Europe All Europe
$ 1800 $ 500 $ 600 $ 700
East $ 800 $ 250 $ 250 $ 300
West $ 1000 $ 250 $ 350 $ 400
![Page 22: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/22.jpg)
The Dimensional Model: Facts
● Fact table structure:
– Several measures– Keys to dimension tables
● Measures:
– Usually numeric, Additive, Semi-additive– Sometimes pre-calculated
● Rapidly growing!
– Millions, Billions of rows (Terabytes)
![Page 23: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/23.jpg)
The Dimensional Model: Dimensions
● Dimension table structure:
– Surrogate key and descriptive text attributes ● Relatively few rows
– Exception: Customer 'Monster' dimension● Relatively static
– Exception: Slowly changing dimensions● Used to navigate through fact data
– Hierarchies
![Page 24: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/24.jpg)
The Dimensional Model: Navigating data with Dimensions
● Selection (Filter)● Navigation: Attributes organized in Hierarchies
– Date dimension examples:● Year, Quarter, Month, Day● Year, Week, Day
● Groupings for Aggregation
– 'Roll up', 'Drill Down'– 'Slice and Dice'
![Page 25: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/25.jpg)
The Dimensional Model: Maintaining History
● Fact table usually links to a date dimension● Dimensions maintain their own history
– Slowly changing dimensions● Type I Overwrite (no history)● Type II
– History kept in rows (versioning)● Type III
– History kept in columns
![Page 26: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/26.jpg)
Starring Sakila
Part V:
Physical Implementation
![Page 27: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/27.jpg)
Dimensional Model Implementation: Star Schema
● Related metrics stored in a Fact table● Fact table references relevant dimensions● Each Dimension stored in a Dimension Table● Dimension tables shared by multiple fact tables
![Page 28: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/28.jpg)
Rentals
Star Schema example: Sakila Rentals
Store
Date
Time
Film
CustomerStaff
![Page 29: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/29.jpg)
Star Schema example: Sakila Rentals
fact_rentalfact_inventory fact_payment
dim_date
dim_customerdim_store dim_staffdim_store
dim_film
![Page 30: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/30.jpg)
Stars versus Snowflakes
● Star schema is 'just' an implementation– Optimized for simplicity
– Optimized for performance (?)
– Heavily denormalized dimensions
● There is an alternative: Snowflake– Normalized dimensions
![Page 31: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/31.jpg)
Snow Flake example: Sakila Rentals
StoreDate
Minute
Film
Customer
Staff
Month
Hour
Quarter City
Country
City
Country
Language
Rating
Year
Week Rentals
![Page 32: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/32.jpg)
Starring Sakila
Part V:
A Star is Born
![Page 33: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/33.jpg)
Dimensional Model example
● MySQL Sample Database– http://dev.mysql.com/doc/sakila/en/sakila.html
● DVD rental business– Overly simplified database schema
● Typical OLTP database
![Page 34: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/34.jpg)
3NF Source schema: Sakila Rentals
Rental Customer
Film
Store Address
Category Actor
StaffInventory
City
CountryLanguage
![Page 35: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/35.jpg)
Target Star Schema
Fact: Rentals
Store
Date
Time
Film
When?
Where?
What?
CustomerStaff
Who?
![Page 36: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/36.jpg)
Dimensional Design
● Select Business Process– Sales, Purchase, Storage, ...
● Define Facts and Key Metrics– Facts: Key Event in Business Process
– Metrics (Fact Attributes): Count or Amount
● Choose Dimensions and Hierarchies– What? When? Where?
– Who? Why?
![Page 37: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/37.jpg)
Example Business Process:Rentals
● Select Business Process– Rentals
● Identify Facts– Count (number of rentals)
– Rental Duration
● Choose Dimensions– What: Films
– When: Rental, Return
– Who: Customer, Staff
– Where: Store
![Page 38: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/38.jpg)
A star is born: Rentals 3NF
Rental
CustomerStaffInventory
![Page 39: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/39.jpg)
A star is born: Rentals 3NF
Rental
CustomerStaffInventory
StoreFilm
Category
Film Category
![Page 40: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/40.jpg)
A star is born: Denormalize
Rental
CustomerStaffInventory
StoreFilm
Category
Film Category
![Page 41: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/41.jpg)
A star is born: Denormalize
Rental
CustomerStaff
StoreFilm
StoreCategory
![Page 42: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/42.jpg)
A star is born
Rental
CustomerStaff
StoreFilm
Store
Address
Category
![Page 43: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/43.jpg)
A star is born: Denormalize
Rental
CustomerStaff
StoreFilm
Store
Address
Category
![Page 44: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/44.jpg)
A star is born: Denormalize
Rental
CustomerStaff
StoreFilm
Store
AddressAddress
Language
Category
![Page 45: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/45.jpg)
A star is born: Denormalize
Rental
CustomerStaff
StoreFilm
Store
AddressAddress
LanguageCityCity
Category
![Page 46: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/46.jpg)
A star is born: Rental Snowflake
Rental
CustomerStaff
StoreFilm
Store
AddressAddress
LanguageCityCity
CountryCountry
Category
![Page 47: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/47.jpg)
A star is born: Rental Star Schema
Rental
StoreLanguage
Film
Country
City
What: Film Who: CustomerWhere: Store Who: Staff
Address
Store
Staff
Country
City
Address
Customer
Category
![Page 48: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/48.jpg)
Dimensional Design
● Something is missing....– Who ? (Customer, Staff)
– What ? (Film)
– Where ? (Store)
– .... ?
![Page 49: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/49.jpg)
A star is born:Rental Date and Time
Rental
What: Film Who: CustomerWhere: Store Who: Staff
When: Date When: Time
![Page 50: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/50.jpg)
Role Playing: Date/Timefor both Rentals and Returns
Rental
What: Film Who: CustomerWhere: Store Who: Staff
When:Rental Date
When:Rental Time
When:Return Date
When:Return Time
![Page 51: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/51.jpg)
![Page 52: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/52.jpg)
Rental Star Schema
![Page 53: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/53.jpg)
Starring Sakila
Part IV:
Filling the Data Warehouse
![Page 54: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/54.jpg)
High Level Data Warehouse Architecture
ERP
Staging Area
Data Warehouse
Meta Data
BI Applications
ReportingAnalysisVisualizationsDashboardsData Mining
Extract Transform Load Present
Back-end Front-end
Datamarts
Sources
CRM
External Data
![Page 55: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/55.jpg)
Planning the ETL Process
● Physical Design● Source to Target Mapping
– Define how data in the data warehouse is derived from data in the source system(s)
– Specification for designing the ETL process
● Column-level mapping– Source system, schema, table, column, data type
– Target dimension/fact, column, defaults
– Transformation rules, cleansing, lookup, calculation
![Page 56: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/56.jpg)
Designing the ETL Process
● Staging?● Changed Data Capture / Extraction● Denormalization● Derived data / Enrichment● Cleansing / Conforming● History policy (dimensions)● Granularity● Dimension Lookup (facts)
![Page 57: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/57.jpg)
Designing ETL with Kettle
● Flow ETL Engine● Transformations
– Data flow and processing
● Jobs– Workflow of ETL tasks
● Tools– Spoon
– Kitchen
– Pan
![Page 58: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/58.jpg)
Loading a Fact Table
● Load Dimension Tables● Load Fact table
![Page 59: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/59.jpg)
Loading a Dimension Table
● Get Customers source data● Lookup Address (Denormalize)● Update Dimension
![Page 60: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/60.jpg)
Loading a Fact Table
![Page 61: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/61.jpg)
Starring Sakila
Part V:
Presenting the Data:BI Applications
![Page 62: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/62.jpg)
Business Intelligence Scope
Operational
Strategic
Tactical
Customers Partners Employees
Analysts
Executives
Managers
days, weeks:
Who's available for tomorrow's shift
weeks, months:
In what region should we open a new store?
months, years:
Should we become an ap-pliance vendor instead of delivering software solu-tions
Reporting
OLAP/Analysis
Data mining
![Page 63: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/63.jpg)
Reporting
![Page 64: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/64.jpg)
Reporting● Mostly Operational● Lists and Grouping● Typically standardized● Typically no or limited interactivity
– Subreporting
![Page 65: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/65.jpg)
Scope of Reporting
Operational
Customers Partners Employees
days, weeks:
Who's available for tomorrow's shift
weeks, months:
In what region should we open a new store?
months, years:
Should we become an ap-pliance vendor instead of delivering software solu-tions
Reporting
OLAP/Analysis
Data mining
Strategic
Tactical
Analysts
Executives
Managers
![Page 66: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/66.jpg)
Reporting
![Page 67: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/67.jpg)
Analysis
![Page 68: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/68.jpg)
Analysis● Tactical, Strategic● OLAP
– Online Analytical Processing● Pivot tables● Typically Interactive
– Slice and Dice– Drilldown
● Typically Ad-hoc
![Page 69: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/69.jpg)
Scope of OLAP & Analysis
Operational
Customers Partners Employees
days, weeks:
Who's available for tomorrow's shift
weeks, months:
In what region should we open a new store?
months, years:
Should we become an ap-pliance vendor instead of delivering software solu-tions
Reporting
OLAP/Analysis
Data mining
Tactical
AnalystsManagers
Strategic
Executives
![Page 70: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/70.jpg)
Analysis Interactive Pivot table
![Page 71: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/71.jpg)
Data Mining
![Page 72: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/72.jpg)
Data Mining● Strategic, Tactical● Discover hidden patterns in data● Machine learning● Statistic analysis● Typically not interactive, long running● Expert matter● Not readily consumable by end-users
– Characteristics of back-end processing
![Page 73: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/73.jpg)
Scope of Data Mining
Operational
Customers Partners Employees
days, weeks:
Who's available for tomorrow's shift
weeks, months:
In what region should we open a new store?
months, years:
Should we become an ap-pliance vendor instead of delivering software solu-tions
Reporting
OLAP/Analysis
Data mining
Strategic
Tactical
Analysts
Executives
Managers
![Page 74: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/74.jpg)
Data Mining
![Page 75: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/75.jpg)
Charts and Graphs
![Page 76: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/76.jpg)
Charts and Graphs● Operational, Tactical, Strategic● Summarize large dataset● Not a separate class but a presentation
– Data Visualization● Standardized or ad-hoc● Can be interactive
– Drive a subreport– Drive drilldown
![Page 77: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/77.jpg)
Dashboarding
![Page 78: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/78.jpg)
Dashboarding● Operational, Tactical, Strategic● Not a separate class but a presentation● Bundle:
– key metrics for a particular role or perspective
– different views on the same metrics● Can contain reports, pivot tables, charts, graphs● Typically interactive
![Page 79: A Data Warehousing and Business Intelligence Tutorial · A Data Warehousing and Business Intelligence Tutorial. Starring Sakila Welcome! Matt Casters Chief Data Integration, Pentaho](https://reader031.vdocuments.mx/reader031/viewer/2022020717/5ad6cd2b7f8b9ab8378b6c38/html5/thumbnails/79.jpg)
Dashboard