data warehousing & olap nuosang du jon b. arnason csci 5707 november 19, 2013

10
Data Warehousing & OLAP Nuosang Du Jon B. Arnason CSCI 5707 November 19, 2013

Upload: roderick-matthews

Post on 23-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Warehousing & OLAP Nuosang Du Jon B. Arnason CSCI 5707 November 19, 2013

Data Warehousing & OLAP

Nuosang DuJon B. ArnasonCSCI 5707November 19, 2013

Page 2: Data Warehousing & OLAP Nuosang Du Jon B. Arnason CSCI 5707 November 19, 2013

1st affect

The on-line analytical processing has some advantages such as flexible analysis function, intuitive data manipulation and the

visualization to analysis results which makes easy and efficient to the analysis based on a complex large number of data. Users

can make the right judgments quickly. It can be used to confirm the proposed complex hypothesis, the result is in the form of

graphics or tables. It doesn’t mark the abnormal information.

OLAP should be based on a large amount of historical data on different time points and the complex analysis of

multidimensional and integrated information.

OLAP requires that the user has subjective information requirements definition, so the system efficiency is better.

2nd origin

Is the concept of on-line analytical processing (OLAP), as the farther of the relational database E.F. Codd proposed in 1993, he

also put forward 12 rules of OLAP, which caused great repercussions.

Rule 1: OLAP model must provide multidimensional conceptual view

Rule 2: Transparency

Rule 3: Access ability

Rule 4: Stable report ability

Rule 5: Client/server architecture

Rule 6: Equal dimension

Rule 7: Dynamic sparse matrix processing

Rule 8: Multi-user support ability

Rule 9: Unlimited operating across dimensions

Rule 10: Direct manipulation of data

Page 3: Data Warehousing & OLAP Nuosang Du Jon B. Arnason CSCI 5707 November 19, 2013

Rule 11: Flexible report generation

Rule 12: Unlimited dimension and gathering

Time_ID

WorkdayWeek_No

Sales ID

Region_IDTime_IDProduct_IDCustomer_ID

RevenueQuantity_Sold

Pro_ID

PNameJ

Week_NoJjjjjjjjjj

Region_ID

Reg_DesCity_IDJ

City_ID

CnamePro_IDJ

Provience

City

Sales Region

Fact Table

Day

Cus_ID

NameCCat_IDJ

Week

CCat_ID

CCat_DesJ

PC_ID

PC_DesJ

PSC_ID

PSC_DesPSC_IDPC_IDJ

Prod_ID

B_IDPSC_IDJ

II_ID

Des

J

Brand

CustomerCategory

Customer Product

ProductSub_Category

ProductCategory

Snow-flake Schema

Geography Dimension

Customer Dimension

Time Dimension

Product Dimension

Page 4: Data Warehousing & OLAP Nuosang Du Jon B. Arnason CSCI 5707 November 19, 2013

3rd Category

Today’s data processing can be divided into two categories: online transaction processing (OLTP), on-line Analytical processing (OLAP).

OLTP is traditional relational database application for basic and daily transaction processing, such as bank transactions. OLAP is the

main application of data warehouse system which supporting complex analysis operation, focusing on the decision supporting, it provides

straightforward query results.

The following table lists the comparison between OLTP and OLAP.

OLTPOLAP

uUUsUSeoperators, lower-level

managers

decision makers, high-lever

managers

daily operation processing analysis and decision

database design application-oriented topic-oriented

current, the latest details, two-dimensional, discrete

historic, gathered, multi-dimensional, integrated and unified

accessread/write dozens of

recordsread millions of records

work for simple transaction complex query

size of database 100MB-GB 100GB-TB

user

function

database design

data

access

work for

size of database

Page 5: Data Warehousing & OLAP Nuosang Du Jon B. Arnason CSCI 5707 November 19, 2013

Customer_ID

Customer NameCustomer Category

4th Development background(You can find them on the internet easily.)

5th Function

In general, data warehouse system is a comprehensive enterprise database which can carry on the fast and accurate analysis to a large number of data for making better business decisions. It consists of three parts:

Page 6: Data Warehousing & OLAP Nuosang Du Jon B. Arnason CSCI 5707 November 19, 2013

1. The data layer: the implementation of enterprise operating data extraction, transformation, cleaning and summary,

forming the information, and storing in the database.

2. The application layer: through the on-line analytical processing, and even data mining application processing, realize

the analysis of the data.

3. The presentation layer: through the front desk analysis tool, the query statements, statistical analysis, and the

conclusion of multi-dimensional online analysis and data mining, it can be shown in front of the user.

6 Concepts

OLAP is shown in front of the user its multidimensional view.

D (Dimension): is the observation data of a particular perspective, it can be considered as a class attribute, the attribute sets from a D

(time Dimension, etc.).

D Level (Level): is the observation data of a particular perspective (i.e., a D), can also be different in every detail description field (time

D: date, month, quarter).

D (Member): a dimension value is the description of the data item in one dimensional position. (“yy/mm/dd” is a description of the

position on the time dimension).

M (Measure): values of multidimensional array. (Sep 2013, Minneapolis, 5707, database principle).

Operation of OLAP multidimensional analysis have Drill (Drill up and Drill down), Slice, Dice and Pivot, etc.

Drill: is to change the dimensional hierarchy and the analysis granularity. It includes the drill down and drill up. Drill up is the low level

of detail data will be summarized to a high level of summary data, or reducing dimension; and drill down, on the other hand, transforms

from the summary data into the detail data to observe or add new dimensions.

Page 7: Data Warehousing & OLAP Nuosang Du Jon B. Arnason CSCI 5707 November 19, 2013

Slice and dice: Given selected values in a part of D, we are concerned with measurement data in the distribution of the remaining

dimensions. If the rest only has two dimensions it is called sliced; if it has three or more, it is called cut.

Pivot: The transformation of dimension in the direction, reschedule the placement of D in the tables (for example, the swap of columns

and rows).

7th Architecture

The architecture of a data warehouse and OLAP are a complementary relationship. The modern OLAP system is based on data

warehousing, in other words, generally drawn for a subset of the detailed data from the data warehouse and after the necessary gathering

of the storage in the OLAP memory for the reading of the front-end analysis tools. A typical OLAP system architecture as shown below:

OLAP system according to the data storage format can be divided into relational-OLAP(ROLAP), multidimensional OLAP(MOLAP)

and hybrid-OLAP(HOLAP) three types.ROLAP MOLAP

R: Use existing relational database technologyM: Especially designed for the on-line analytical processing

R: The response speed is slower than the MOLAP;M: Existing relational database for OLAP already do a lot of optimization, including parallel storage, parallel query, parallel data management, query optimization based on the cost, the bitmap index, resulting an improved performance.

R: Data loading speed is fastM: Data loading speed is slow

Page 8: Data Warehousing & OLAP Nuosang Du Jon B. Arnason CSCI 5707 November 19, 2013

R: Storage cost is small, there is no limit to the dimensionM: Need to calculate, may lead to data explosion, dimension is limited; failing to support the dynamic change of dimension

R: Use RDBMS stored data, there is no file size limitM: The file size be limited in the operating system platform (only 10~20 g)

R: Can be realized through SQL detailed data and summary data storageM: Lacking the standards of data model and data access

The realization of the on-line analytical processing has three different methods:

1. ROLAP2. MOLAP3. Front-end display, on-line analytical processing (Desktop OLAP)

Among these, Desktop OLAP needs to download all the data to the client, and then take report format/data structure reorganization on the client. The user can realize the dynamic analysis in the native. The method is more flexible, but it can support the very limited amount of data which seriously affect the scope and efficiency of use. So it has been eliminated now.

You can find some examples below:

HyperionOracleCognosMicroStrategyIBMBrio

Page 9: Data Warehousing & OLAP Nuosang Du Jon B. Arnason CSCI 5707 November 19, 2013

Widget: Silverlight Winforms

Based on a CUBE calculation and analysis.

Page 10: Data Warehousing & OLAP Nuosang Du Jon B. Arnason CSCI 5707 November 19, 2013

Bibliography

Mailvaganam, Hari. “Introduction to OLAP”. www.dwreview.com/OLAP/Introduction_OLAP.html

Chaudhuri, Surajit; Dayal, Umeshwar; Narasayya, Vivek. “An Overview of Business Intelligence”. Communications Of The ACM, Vol. 54, No. 8, August 2011.