chapter 14 the data warehouse fundamentals of database management systems by mark l. gillenson,...

57
Chapter 14 Chapter 14 The Data Warehouse The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal Chin, Ph.D. Virginia Commonwealth University John Wiley & Sons, Inc.

Upload: theodore-shannon-holland

Post on 05-Jan-2016

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Chapter 14 The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal

Chapter 14Chapter 14The Data WarehouseThe Data Warehouse

Fundamentals of Database Management Systemsby

Mark L. Gillenson, Ph.D.

University of Memphis

Presentation by: Amita Goyal Chin, Ph.D.

Virginia Commonwealth University

John Wiley & Sons, Inc.

Page 2: Chapter 14 The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal

14-14-22

Chapter ObjectivesChapter Objectives

Compare the data needs of transaction Compare the data needs of transaction processing systems with those of decision processing systems with those of decision support systems. support systems.

Describe the data warehouse concept and Describe the data warehouse concept and list its main features. list its main features.

Compare the enterprise data warehouse Compare the enterprise data warehouse with the data mart. with the data mart.

Page 3: Chapter 14 The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal

14-14-33

Chapter ObjectivesChapter Objectives

Design a data warehouse. Design a data warehouse.

Build a data warehouse, including the Build a data warehouse, including the steps of data extraction, data cleaning, steps of data extraction, data cleaning, data transformation, and data loading. data transformation, and data loading.

Describe how to use a data warehouse Describe how to use a data warehouse with online analytic processing and data with online analytic processing and data mining. mining.

Page 4: Chapter 14 The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal

14-14-44

Chapter ObjectivesChapter Objectives

List the types of expertise needed to List the types of expertise needed to administer a data warehouse. administer a data warehouse.

List the challenges in data warehousing.List the challenges in data warehousing.

Page 5: Chapter 14 The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal

14-14-55

Application SystemsApplication Systems

Transaction Processing Systems (TPS)Transaction Processing Systems (TPS) Everyday application systems that support Everyday application systems that support

banking and insurance operations, manage banking and insurance operations, manage the parts inventory on manufacturing the parts inventory on manufacturing assembly lines, keep track of airline and hotel assembly lines, keep track of airline and hotel reservations, support Web-based sales, etc.reservations, support Web-based sales, etc.

Decision Support Systems (DSS)Decision Support Systems (DSS) specifically designed to aid managers in specifically designed to aid managers in

decision-making tasks.decision-making tasks.

Page 6: Chapter 14 The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal

14-14-66

The Data Warehouse ConceptThe Data Warehouse Concept

A data warehouse is a broad-based, A data warehouse is a broad-based, shared database for management decision shared database for management decision making that contains data that has been making that contains data that has been accumulated over time.accumulated over time.

Formally, a database warehouse is, “a Formally, a database warehouse is, “a subject oriented, integrated, non-volatile, subject oriented, integrated, non-volatile, and time variant collection of data in and time variant collection of data in support of management’s decisions.”support of management’s decisions.”

Page 7: Chapter 14 The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal

14-14-77

Characteristics of Characteristics of Data Warehouse DataData Warehouse Data

The data is subject orientedThe data is subject oriented The data is integratedThe data is integrated The data is non-volatileThe data is non-volatile The data is time variantThe data is time variant The data must be high qualityThe data must be high quality The data may be aggregatedThe data may be aggregated The data is often denormalizedThe data is often denormalized The data is not necessarily absolutely currentThe data is not necessarily absolutely current

Page 8: Chapter 14 The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal

14-14-88

The Data is Subject OrientedThe Data is Subject Oriented

Data warehouses are organized around Data warehouses are organized around subjects, really the major entities of subjects, really the major entities of concern in the business environment.concern in the business environment. Sales, customers, orders, claims, accounts, Sales, customers, orders, claims, accounts,

employees, other entities that are central to employees, other entities that are central to the company’s business.the company’s business.

Page 9: Chapter 14 The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal

14-14-99

The Data is IntegratedThe Data is Integrated Data about each of the subjects in the data warehouse is typically Data about each of the subjects in the data warehouse is typically

collected from several of the company’s transactional databases, collected from several of the company’s transactional databases, each of which supports one or more applications that have each of which supports one or more applications that have something to do with the particular subject.something to do with the particular subject.

All of the data about a subject must be organized or integrated in All of the data about a subject must be organized or integrated in such a way that it provides a unified, overall picture of all the such a way that it provides a unified, overall picture of all the important details about the subject over time.important details about the subject over time.

Data from disparate application databases must be transformed into Data from disparate application databases must be transformed into common measurements, codes, data types.common measurements, codes, data types.

Page 10: Chapter 14 The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal

14-14-1010

The Data is Non-VolatileThe Data is Non-Volatile

Once data is added to the data Once data is added to the data warehouse, it doesn’t change.warehouse, it doesn’t change.

It will never change. Changing it would be It will never change. Changing it would be like going back and rewriting history.like going back and rewriting history.

Page 11: Chapter 14 The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal

14-14-1111

The Data is Time VariantThe Data is Time Variant

Data warehouse data, with its historic nature, Data warehouse data, with its historic nature, always includes some kind of a timestamp.always includes some kind of a timestamp.

If we are storing sales data on a weekly or If we are storing sales data on a weekly or monthly basis and we have accumulated ten monthly basis and we have accumulated ten years of such historic data, each weekly or years of such historic data, each weekly or monthly sales figure must be accompanied by a monthly sales figure must be accompanied by a timestamp indicating the week or month (and timestamp indicating the week or month (and year!) that it represents.year!) that it represents.

Page 12: Chapter 14 The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal

14-14-1212

The Data Must Be The Data Must Be High QualityHigh Quality

Consider a section of a data warehouse in which the Consider a section of a data warehouse in which the subject is customer.subject is customer.

If there is a customer address misspelling in one If there is a customer address misspelling in one transactional file, when the data from that file is transactional file, when the data from that file is integrated with the data from the other transactional files, integrated with the data from the other transactional files, there will be some difficulty in reconciling whether the there will be some difficulty in reconciling whether the two different addresses both represent one customer, or two different addresses both represent one customer, or whether they actually represent two different customers.whether they actually represent two different customers.

This must be reconciled as the data is integrated and This must be reconciled as the data is integrated and entered into the data warehouse.entered into the data warehouse.

Page 13: Chapter 14 The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal

14-14-1313

The Data May Be AggregatedThe Data May Be Aggregated

The type of data that management requires for decision The type of data that management requires for decision making is generally summarized data.making is generally summarized data.

The sheer volume of all the historic detail data would The sheer volume of all the historic detail data would make the data warehouse unacceptably huge in many make the data warehouse unacceptably huge in many cases.cases.

If the detail data was stored in the data warehouse, the If the detail data was stored in the data warehouse, the amount of time that it would take to summarize the data amount of time that it would take to summarize the data for management every time a query was posed would for management every time a query was posed would often be unacceptable.often be unacceptable.

Page 14: Chapter 14 The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal

14-14-1414

The Data is Often The Data is Often DenormalizedDenormalized

If a company is willing to tolerate the substantial If a company is willing to tolerate the substantial additional space taken up by the redundant additional space taken up by the redundant denormalized data, it can gain the advantage of the denormalized data, it can gain the advantage of the improved query performance that redundancy provides improved query performance that redundancy provides without paying the penalties of increased update time without paying the penalties of increased update time and potential data integrity problems.and potential data integrity problems.

This works because the data integrity problems that can This works because the data integrity problems that can be caused by redundant data only arise when the data is be caused by redundant data only arise when the data is updated. The historic data in the data warehouse will not updated. The historic data in the data warehouse will not be updated.be updated.

Page 15: Chapter 14 The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal

14-14-1515

The Data is Not Necessarily The Data is Not Necessarily Absolutely CurrentAbsolutely Current

Data warehouse data is updated at some Data warehouse data is updated at some time interval -- weekly, monthly, etc.time interval -- weekly, monthly, etc.

Any changes since the last data Any changes since the last data warehouse update are not recorded in it warehouse update are not recorded in it until the next scheduled update.until the next scheduled update.

Inconsequential when looking at long-term Inconsequential when looking at long-term trends.trends.

Page 16: Chapter 14 The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal

14-14-1616

Types of Data WarehousesTypes of Data Warehouses

Enterprise Data Enterprise Data Warehouse (EDW)Warehouse (EDW)

Data Mart (DM)Data Mart (DM)

Page 17: Chapter 14 The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal

14-14-1717

Enterprise Data WarehouseEnterprise Data Warehouse

Large-scale; incorporates the data of an entire Large-scale; incorporates the data of an entire company or of a major division, site, or activity of company or of a major division, site, or activity of a company.a company.

A full scale EDW is built around several different A full scale EDW is built around several different subjects.subjects.

Support a wide variety of DSS applications and Support a wide variety of DSS applications and serve as a data resource with which company serve as a data resource with which company managers can explore new ways of using the managers can explore new ways of using the company’s data to its advantage.company’s data to its advantage.

Page 18: Chapter 14 The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal

14-14-1818

The Data MartThe Data Mart

Small-scale; designed to support a small Small-scale; designed to support a small part of an organization.part of an organization.

A company will often have several DMs.A company will often have several DMs.

Are based on a limited number of subjects Are based on a limited number of subjects (possibly one) and are constructed from a (possibly one) and are constructed from a limited number of transactional databases.limited number of transactional databases.

Page 19: Chapter 14 The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal

14-14-1919

Which to Choose:Which to Choose:The EDW, the DM, or Both?The EDW, the DM, or Both?

It depends from company to company.It depends from company to company.

Top-down development implies that the EDW was Top-down development implies that the EDW was created first and then later data was extracted from an created first and then later data was extracted from an EDW to create one or more DMs.EDW to create one or more DMs.

A company that has deliberately or as a matter of A company that has deliberately or as a matter of circumstance developed a series of independent DMs circumstance developed a series of independent DMs may decide, in a bottom-up development fashion to build may decide, in a bottom-up development fashion to build an EDW out of the existing DMs.an EDW out of the existing DMs.

Page 20: Chapter 14 The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal

14-14-2020

Designing a Data WarehouseDesigning a Data Warehouse

Two characteristics of data warehouses are Two characteristics of data warehouses are central to any design:central to any design: The subject orientation.The subject orientation. The historic nature of the data.The historic nature of the data.

Data warehouses are often referred to as Data warehouses are often referred to as multidimensional databasesmultidimensional databases because each because each occurrence of the subject is referenced by an occurrence of the subject is referenced by an occurrence of each of several dimensions or occurrence of each of several dimensions or characteristics of the subject, one of which is characteristics of the subject, one of which is time.time.

Page 21: Chapter 14 The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal

14-14-2121

Multidimensional DatabasesMultidimensional Databases

Two dimensions can easily be visualized on a Two dimensions can easily be visualized on a flat piece of paper.flat piece of paper.

Page 22: Chapter 14 The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal

14-14-2222

Multidimensional DatabasesMultidimensional Databases

Three dimensions can easily be visualized on a flat piece of paper as a cube.Three dimensions can easily be visualized on a flat piece of paper as a cube.

Four or more dimensions are more difficult to visualize.Four or more dimensions are more difficult to visualize.

Page 23: Chapter 14 The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal

14-14-2323

Storing Multidimensional DataStoring Multidimensional Data

There is much interest in storing There is much interest in storing multidimensional data in relational databases.multidimensional data in relational databases.

The The star schemastar schema.. Visual design in which the subject is in the middle and Visual design in which the subject is in the middle and

the dimensions radiate outwards.the dimensions radiate outwards.

Have a “fact table” which represents the data Have a “fact table” which represents the data warehouse “subject” and several “dimension tables.”warehouse “subject” and several “dimension tables.”

Page 24: Chapter 14 The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal

14-14-2424

General Hardware Company General Hardware Company Data WarehouseData Warehouse

Here is the General Hardware transactional database.Here is the General Hardware transactional database.

Page 25: Chapter 14 The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal

14-14-2525

General Hardware Company General Hardware Company Data WarehouseData Warehouse

SALE is the fact table.SALE is the fact table. Like any relational Like any relational

table, must have a table, must have a primary key.primary key.

Dimension tables:Dimension tables: SALESPERSONSALESPERSON PRODUCTPRODUCT TIME PERIODTIME PERIOD

Page 26: Chapter 14 The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal

14-14-2626

General Hardware Company General Hardware Company Data WarehouseData Warehouse

Page 27: Chapter 14 The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal

14-14-2727

Good Reading Bookstores Good Reading Bookstores Data WarehouseData Warehouse

Do they need a data warehouse, since they Do they need a data warehouse, since they already store a date attribute?already store a date attribute?

Yes, for two reasons:Yes, for two reasons: While the transactional database performs acceptably While the transactional database performs acceptably

with perhaps the last couple of months of data in it, its with perhaps the last couple of months of data in it, its performance would degrade to an unacceptable level performance would degrade to an unacceptable level if we tried to keep ten years of data in it.if we tried to keep ten years of data in it.

The kinds of management decision making that The kinds of management decision making that require long-term historic sales data require require long-term historic sales data require aggregate not daily data.aggregate not daily data.

Page 28: Chapter 14 The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal

14-14-2828

Good Reading Bookstores Good Reading Bookstores Data WarehouseData Warehouse

SALE is the fact table.SALE is the fact table. Like any relational table, must Like any relational table, must

have a primary key.have a primary key.

Dimension tables:Dimension tables: BOOKBOOK PUBLISHERPUBLISHER CUSTOMERCUSTOMER TIME PERIODTIME PERIOD

Snowflake designSnowflake design One dimension table (BOOK) One dimension table (BOOK)

leads to another dimension leads to another dimension table (PUBLISHER).table (PUBLISHER).

Page 29: Chapter 14 The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal

14-14-2929

Lucky Rent-A-Car Data Lucky Rent-A-Car Data WarehouseWarehouse

RENTAL is the fact table.RENTAL is the fact table. Does not contain aggregated Does not contain aggregated

data.data.

Dimension tables:Dimension tables: CARCAR MANUFACTURERMANUFACTURER CUSTOMERCUSTOMER TIME PERIODTIME PERIOD

Snowflake designSnowflake design One dimension table (CAR) One dimension table (CAR)

leads to another dimension table leads to another dimension table (MANUFACTURER).(MANUFACTURER).

Page 30: Chapter 14 The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal

14-14-3030

What About a World Music What About a World Music Association Data Warehouse?Association Data Warehouse? There is already a Year attribute in the RECORDING There is already a Year attribute in the RECORDING

table.table.

The essence of the WMA data is historic.The essence of the WMA data is historic.

By its nature, the amount of data in a WMA type By its nature, the amount of data in a WMA type transactional database is much lower than the amount of transactional database is much lower than the amount of data in a Good Reading or Lucky-type transactional data in a Good Reading or Lucky-type transactional database.database.

Since the nature of the WMA transactional database blurs Since the nature of the WMA transactional database blurs with what a WMA data warehouse would look like, no with what a WMA data warehouse would look like, no WMA data warehouse is needed.WMA data warehouse is needed.

Page 31: Chapter 14 The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal

14-14-3131

Building a Data WarehouseBuilding a Data Warehouse

Data ExtractionData Extraction

Data CleaningData Cleaning

Data TransformationData Transformation

Data LoadingData Loading

Page 32: Chapter 14 The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal

14-14-3232

Building a Data Warehouse:Building a Data Warehouse:Data ExtractionData Extraction

Process of copying the data from the transactional Process of copying the data from the transactional databases in preparation for loading it into the data databases in preparation for loading it into the data warehouse.warehouse.

This is not a one-time event.This is not a one-time event.

The data is likely to come from several transactional The data is likely to come from several transactional databases.databases.

Some of the data entering into this process may come Some of the data entering into this process may come from outside of the company (data enrichment).from outside of the company (data enrichment).

Page 33: Chapter 14 The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal

14-14-3333

Lucky Rent-A-Car with Lucky Rent-A-Car with Enrichment DataEnrichment Data

In the CUSTOMER In the CUSTOMER table, Customer Age, table, Customer Age, Customer Income, and Customer Income, and Customer Education is Customer Education is the enrichment data.the enrichment data.

Page 34: Chapter 14 The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal

14-14-3434

Data CleaningData Cleaning

Transactional data can have all kinds of Transactional data can have all kinds of errors in it.errors in it.

Data warehouses are very sensitive to Data warehouses are very sensitive to data errorsdata errors Data errors must be “cleaned” or “cleansed” Data errors must be “cleaned” or “cleansed”

or “scrubbed” as the data is loaded into the or “scrubbed” as the data is loaded into the data warehouse.data warehouse.

Page 35: Chapter 14 The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal

14-14-3535

Data CleaningData Cleaning

There are two steps to cleaning transactional There are two steps to cleaning transactional data in preparation for loading it into a data data in preparation for loading it into a data warehouse.warehouse.

Identify the problem data.Identify the problem data.• Due to the massive volume of data, this is typically done Due to the massive volume of data, this is typically done

using a program.using a program.

Fix it.Fix it.• Can be handled by using sophisticated artificial intelligence Can be handled by using sophisticated artificial intelligence

programs or by creating exception reports for employees to programs or by creating exception reports for employees to scrutinize.scrutinize.

Page 36: Chapter 14 The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal

14-14-3636

Good Reading Bookstores Good Reading Bookstores Before Data CleaningBefore Data Cleaning

Errors in Errors in Customer:Customer: Missing data - Missing data -

in row 1, city is in row 1, city is blank.blank.

Questionable Questionable data - the state data - the state for rows 2 & 6 for rows 2 & 6 should be the should be the same.same.

Page 37: Chapter 14 The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal

14-14-3737

Good Reading Bookstores Good Reading Bookstores Before Data CleaningBefore Data Cleaning

Errors in Errors in Customer:Customer: Possible Possible

Misspelling - do Misspelling - do rows 3 & 8 rows 3 & 8 refer to the refer to the same person?same person?

Impossible Impossible Data - row 10s Data - row 10s state “RP” is state “RP” is wrong.wrong.

Page 38: Chapter 14 The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal

14-14-3838

Good Reading Bookstores Good Reading Bookstores Before Data CleaningBefore Data Cleaning

Errors in SALE:Errors in SALE: Questionable data Questionable data

- is the book - is the book quantity of 21 in quantity of 21 in row 2 correct?row 2 correct?

Impossible/Out-of-Impossible/Out-of-Range Data - row Range Data - row 5 indicates that a 5 indicates that a single book costs single book costs $3,200.99.$3,200.99.

Page 39: Chapter 14 The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal

14-14-3939

Good Reading Bookstores Good Reading Bookstores Before Data CleaningBefore Data Cleaning

Errors in SALE:Errors in SALE: Apparently Apparently

Incorrect Data - Incorrect Data - there is no there is no customer number customer number 12738, as stated 12738, as stated in row 8.in row 8.

Impossible Data - Impossible Data - row 10 shows a row 10 shows a negative price for negative price for a book, which is a book, which is impossible.impossible.

Page 40: Chapter 14 The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal

14-14-4040

Data TransformationData Transformation

As the data is extracted from the transactional As the data is extracted from the transactional databases, it must go through several kinds of databases, it must go through several kinds of data transformations on its way to the data data transformations on its way to the data warehouse:warehouse: Data from different transactional databases being Data from different transactional databases being

merged to form the data warehouse tables.merged to form the data warehouse tables.

Data will often be aggregated as it is being extracted Data will often be aggregated as it is being extracted from the transactional databases and prepared for the from the transactional databases and prepared for the data warehouse.data warehouse.

Page 41: Chapter 14 The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal

14-14-4141

Data TransformationData Transformation Units of measure used for attributes in different transactional Units of measure used for attributes in different transactional

databases must be reconciled as they are being merged into databases must be reconciled as they are being merged into common data warehouse tables.common data warehouse tables.

Coding schemes used for attributes in different transactional Coding schemes used for attributes in different transactional databases must be reconciled as they are being merged into databases must be reconciled as they are being merged into common data warehouse tables.common data warehouse tables.

Sometimes values from different attributes in transactional Sometimes values from different attributes in transactional databases are combined into a single attribute in the data databases are combined into a single attribute in the data warehouse (e.g., employee name).warehouse (e.g., employee name).

Page 42: Chapter 14 The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal

14-14-4242

Data LoadingData Loading

After all of the extracting, cleaning, and After all of the extracting, cleaning, and transforming, the data is ready to be transforming, the data is ready to be loaded into the data warehouse.loaded into the data warehouse.

A schedule for regularly updating the data A schedule for regularly updating the data warehouse must be put in place.warehouse must be put in place.

Page 43: Chapter 14 The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal

14-14-4343

Using a Data WarehouseUsing a Data Warehouse

Online analytic processing (OLAP)Online analytic processing (OLAP)

Data MiningData Mining

Page 44: Chapter 14 The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal

14-14-4444

Online Analytic ProcessingOnline Analytic Processing

A decision support methodology based on A decision support methodology based on viewing data in multiple dimensions.viewing data in multiple dimensions.

There are many OLAP systems on the There are many OLAP systems on the market today.market today.

The OLAP environment’s multidimensional The OLAP environment’s multidimensional data is very well suited for querying and data is very well suited for querying and for multi-time period trend analyses.for multi-time period trend analyses.

Page 45: Chapter 14 The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal

14-14-4545

Online Analytic ProcessingOnline Analytic Processing

Drill-DownDrill-Down Going back to the database and retrieving finer levels Going back to the database and retrieving finer levels

of data detail than you have already retrieved.of data detail than you have already retrieved.

SliceSlice A subset of the data that focuses on a single value of A subset of the data that focuses on a single value of

one of the dimensions.one of the dimensions.

Pivot or RotationPivot or Rotation Merely a matter of interchanging the data dimensions.Merely a matter of interchanging the data dimensions.

Page 46: Chapter 14 The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal

14-14-4646

Online Analytic ProcessingOnline Analytic Processing

A A sliceslice of the of the patient data patient data cube.cube.

Page 47: Chapter 14 The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal

14-14-4747

Data MiningData Mining

The searching out of hidden knowledge in the The searching out of hidden knowledge in the company’s data that can give the company a company’s data that can give the company a competitive advantage in its marketplace.competitive advantage in its marketplace.

Due to the massive volume of data warehouse Due to the massive volume of data warehouse data, data mining must be done by software.data, data mining must be done by software. Case-based learningCase-based learning Decision treesDecision trees Neural networksNeural networks Genetic algorithmsGenetic algorithms

Page 48: Chapter 14 The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal

14-14-4848

Data Mining Application: Data Mining Application: Market Based AnalysisMarket Based Analysis

Consider the data collected by a supermarket as it checks out its Consider the data collected by a supermarket as it checks out its customers by scanning the bar codes on the products they’re customers by scanning the bar codes on the products they’re purchasing.purchasing.

The company might have software study the collected market The company might have software study the collected market baskets, each of which is literally the goods that a particular baskets, each of which is literally the goods that a particular customer bought in one trip to the store.customer bought in one trip to the store.

The software might try to discover whether certain items “fall into” The software might try to discover whether certain items “fall into” the same market basket more frequently than would otherwise be the same market basket more frequently than would otherwise be expected.expected.

Then the items often bought in the same shopping trip can be Then the items often bought in the same shopping trip can be placed next to each other in the store to remind someone buying placed next to each other in the store to remind someone buying one that they might also need the other.one that they might also need the other.

Page 49: Chapter 14 The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal

14-14-4949

Data Mining: Lucky Rent-A CarData Mining: Lucky Rent-A Car

ClassManufacturerName Cost

CustomerNumber Age Income Education

1 Compact Ford 320 884730 54 58,000 B.A.2 Luxury Lincoln 850 528262 45 158,000 M.B.A.3 Full-Size General Motors 489 109565 48 62,000 B.S.4 Sub-Compact Toyota 159 532277 25 34,000 High School5 Luxury Lincoln 675 155434 42 125,000 Ph.D.6 Compact Chrysler 360 965578 64 47,500 High School7 Mid-Size Nissan 429 688632 31 43,000 M.B.A.8 Luxury Lincoln 925 342786 47 95,000 M.A.9 Full-Size General Motors 480 385633 51 72,000 B.S.

10 Compact Toyota 230 464367 64 200,000 M.A.11 Luxury Jaguar 1170 528262 45 158,000 M.B.A.12 Sub-Compact Nissan 89 759930 29 28,000 B.A.13 Full-Size Ford 335 478432 57 53,500 B.S.14 Full-Size Chrysler 328 207867 29 162,000 Ph.D.CAR/RENTAL/CUSTOMER

A data mining application may look for patterns in the A data mining application may look for patterns in the data.data. Rows 2, 5, 8, and 11 all involve rentals of luxury class cars with Rows 2, 5, 8, and 11 all involve rentals of luxury class cars with

high-cost (revenue to the company) figures.high-cost (revenue to the company) figures.

Page 50: Chapter 14 The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal

14-14-5050

Data Mining: Lucky Rent-A CarData Mining: Lucky Rent-A Car

ClassManufacturerName Cost

CustomerNumber Age Income Education

1 Compact Ford 320 884730 54 58,000 B.A.2 Luxury Lincoln 850 528262 45 158,000 M.B.A.3 Full-Size General Motors 489 109565 48 62,000 B.S.4 Sub-Compact Toyota 159 532277 25 34,000 High School5 Luxury Lincoln 675 155434 42 125,000 Ph.D.6 Compact Chrysler 360 965578 64 47,500 High School7 Mid-Size Nissan 429 688632 31 43,000 M.B.A.8 Luxury Lincoln 925 342786 47 95,000 M.A.9 Full-Size General Motors 480 385633 51 72,000 B.S.

10 Compact Toyota 230 464367 64 200,000 M.A.11 Luxury Jaguar 1170 528262 45 158,000 M.B.A.12 Sub-Compact Nissan 89 759930 29 28,000 B.A.13 Full-Size Ford 335 478432 57 53,500 B.S.14 Full-Size Chrysler 328 207867 29 162,000 Ph.D.CAR/RENTAL/CUSTOMER

If, as is the case here, these similar rentals were made by If, as is the case here, these similar rentals were made by people with similar demographics, a “cluster”, then future people with similar demographics, a “cluster”, then future marketing can concentrate on selling this product to people marketing can concentrate on selling this product to people with these demographics.with these demographics.

Page 51: Chapter 14 The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal

14-14-5151

Administering a Data Administering a Data WarehouseWarehouse

The data warehouse requires a serious level of The data warehouse requires a serious level of management.management.

Data warehouse administrator - personnel Data warehouse administrator - personnel specialization in the management of the data specialization in the management of the data warehouse.warehouse.

Three kinds of employee expertise is required:Three kinds of employee expertise is required: Business expertiseBusiness expertise Data expertiseData expertise Technical expertiseTechnical expertise

Page 52: Chapter 14 The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal

14-14-5252

Administering a Data Administering a Data Warehouse: Business Warehouse: Business

ExpertiseExpertise An understanding of the company’s business An understanding of the company’s business

processes that underlies an understanding of the processes that underlies an understanding of the company’s transactional data and databases.company’s transactional data and databases.

An understanding of the company’s business An understanding of the company’s business goals to help in determining what data should be goals to help in determining what data should be stored in the data warehouse for eventual OLAP stored in the data warehouse for eventual OLAP and data mining purposes.and data mining purposes.

Page 53: Chapter 14 The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal

14-14-5353

Administering a Data Administering a Data Warehouse: Data ExpertiseWarehouse: Data Expertise

An understanding of the company’s transactional An understanding of the company’s transactional data and databases for selection and integration data and databases for selection and integration into the data warehouse. into the data warehouse.

An understanding of the company’s transactional An understanding of the company’s transactional data and databases to design and manage data data and databases to design and manage data cleaning and data transformation, as necessary. cleaning and data transformation, as necessary.

Familiarity with outside data sources for the Familiarity with outside data sources for the acquisition of enrichment data.acquisition of enrichment data.

Page 54: Chapter 14 The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal

14-14-5454

Administering a Data Administering a Data Warehouse: Technical Warehouse: Technical

ExpertiseExpertise An understanding of data warehouse An understanding of data warehouse

design principles for the initial design. design principles for the initial design.

An understanding of OLAP and data An understanding of OLAP and data mining techniques so that the data mining techniques so that the data warehouse design will properly support warehouse design will properly support these processes.these processes.

Page 55: Chapter 14 The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal

14-14-5555

Administering a Data Administering a Data Warehouse: Technical Warehouse: Technical

ExpertiseExpertise An understanding of the company’s transactional An understanding of the company’s transactional

databases in order to manage or coordinate the databases in order to manage or coordinate the regularly scheduled appending of new data to regularly scheduled appending of new data to the data warehouse. the data warehouse.

An understanding of how to handle very large An understanding of how to handle very large databases with their unique requirements for databases with their unique requirements for security, backup and recovery, being split across security, backup and recovery, being split across multiple disk devices, etc.multiple disk devices, etc.

Page 56: Chapter 14 The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal

14-14-5656

Challenges in Data Challenges in Data WarehousingWarehousing

Data cleaning and finding more “dirty” data than Data cleaning and finding more “dirty” data than expected.expected.

Problems associated with coordinating the regular Problems associated with coordinating the regular appending of new data from the transactional appending of new data from the transactional databases to the data warehouse.databases to the data warehouse.

Difficulties in managing very large databases.Difficulties in managing very large databases.

The challenge of building and maintaining the data The challenge of building and maintaining the data dictionary.dictionary.

Page 57: Chapter 14 The Data Warehouse Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis Presentation by: Amita Goyal

14-14-5757

“Copyright 2004 John Wiley & Sons, Inc. All rights reserved. Reproduction or translation of this work beyond that permitted in Section 117 of the 1976 United States Copyright Act without express permission of the copyright owner is unlawful. Request for further information should be addressed to the Permissions Department, John Wiley & Sons, Inc. The purchaser may make back-up copies for his/her own use only and not for distribution or resale. The Publisher assumes no responsibility for errors, omissions, or damages caused by the use of these programs or from the use of the information contained herein.”