data warehousing & olap nuosang du jon b. arnason csci 5707 november 19, 2013
TRANSCRIPT
Data Warehousing & OLAP
Nuosang DuJon B. ArnasonCSCI 5707November 19, 2013
1st affect
The on-line analytical processing has some advantages such as flexible analysis function, intuitive data manipulation and the
visualization to analysis results which makes easy and efficient to the analysis based on a complex large number of data. Users
can make the right judgments quickly. It can be used to confirm the proposed complex hypothesis, the result is in the form of
graphics or tables. It doesn’t mark the abnormal information.
OLAP should be based on a large amount of historical data on different time points and the complex analysis of
multidimensional and integrated information.
OLAP requires that the user has subjective information requirements definition, so the system efficiency is better.
2nd origin
Is the concept of on-line analytical processing (OLAP), as the farther of the relational database E.F. Codd proposed in 1993, he
also put forward 12 rules of OLAP, which caused great repercussions.
Rule 1: OLAP model must provide multidimensional conceptual view
Rule 2: Transparency
Rule 3: Access ability
Rule 4: Stable report ability
Rule 5: Client/server architecture
Rule 6: Equal dimension
Rule 7: Dynamic sparse matrix processing
Rule 8: Multi-user support ability
Rule 9: Unlimited operating across dimensions
Rule 10: Direct manipulation of data
Rule 11: Flexible report generation
Rule 12: Unlimited dimension and gathering
Time_ID
WorkdayWeek_No
Sales ID
Region_IDTime_IDProduct_IDCustomer_ID
RevenueQuantity_Sold
Pro_ID
PNameJ
Week_NoJjjjjjjjjj
Region_ID
Reg_DesCity_IDJ
City_ID
CnamePro_IDJ
Provience
City
Sales Region
Fact Table
Day
Cus_ID
NameCCat_IDJ
Week
CCat_ID
CCat_DesJ
PC_ID
PC_DesJ
PSC_ID
PSC_DesPSC_IDPC_IDJ
Prod_ID
B_IDPSC_IDJ
II_ID
Des
J
Brand
CustomerCategory
Customer Product
ProductSub_Category
ProductCategory
Snow-flake Schema
Geography Dimension
Customer Dimension
Time Dimension
Product Dimension
3rd Category
Today’s data processing can be divided into two categories: online transaction processing (OLTP), on-line Analytical processing (OLAP).
OLTP is traditional relational database application for basic and daily transaction processing, such as bank transactions. OLAP is the
main application of data warehouse system which supporting complex analysis operation, focusing on the decision supporting, it provides
straightforward query results.
The following table lists the comparison between OLTP and OLAP.
OLTPOLAP
uUUsUSeoperators, lower-level
managers
decision makers, high-lever
managers
daily operation processing analysis and decision
database design application-oriented topic-oriented
current, the latest details, two-dimensional, discrete
historic, gathered, multi-dimensional, integrated and unified
accessread/write dozens of
recordsread millions of records
work for simple transaction complex query
size of database 100MB-GB 100GB-TB
user
function
database design
data
access
work for
size of database
Customer_ID
Customer NameCustomer Category
4th Development background(You can find them on the internet easily.)
5th Function
In general, data warehouse system is a comprehensive enterprise database which can carry on the fast and accurate analysis to a large number of data for making better business decisions. It consists of three parts:
1. The data layer: the implementation of enterprise operating data extraction, transformation, cleaning and summary,
forming the information, and storing in the database.
2. The application layer: through the on-line analytical processing, and even data mining application processing, realize
the analysis of the data.
3. The presentation layer: through the front desk analysis tool, the query statements, statistical analysis, and the
conclusion of multi-dimensional online analysis and data mining, it can be shown in front of the user.
6 Concepts
OLAP is shown in front of the user its multidimensional view.
D (Dimension): is the observation data of a particular perspective, it can be considered as a class attribute, the attribute sets from a D
(time Dimension, etc.).
D Level (Level): is the observation data of a particular perspective (i.e., a D), can also be different in every detail description field (time
D: date, month, quarter).
D (Member): a dimension value is the description of the data item in one dimensional position. (“yy/mm/dd” is a description of the
position on the time dimension).
M (Measure): values of multidimensional array. (Sep 2013, Minneapolis, 5707, database principle).
Operation of OLAP multidimensional analysis have Drill (Drill up and Drill down), Slice, Dice and Pivot, etc.
Drill: is to change the dimensional hierarchy and the analysis granularity. It includes the drill down and drill up. Drill up is the low level
of detail data will be summarized to a high level of summary data, or reducing dimension; and drill down, on the other hand, transforms
from the summary data into the detail data to observe or add new dimensions.
Slice and dice: Given selected values in a part of D, we are concerned with measurement data in the distribution of the remaining
dimensions. If the rest only has two dimensions it is called sliced; if it has three or more, it is called cut.
Pivot: The transformation of dimension in the direction, reschedule the placement of D in the tables (for example, the swap of columns
and rows).
7th Architecture
The architecture of a data warehouse and OLAP are a complementary relationship. The modern OLAP system is based on data
warehousing, in other words, generally drawn for a subset of the detailed data from the data warehouse and after the necessary gathering
of the storage in the OLAP memory for the reading of the front-end analysis tools. A typical OLAP system architecture as shown below:
OLAP system according to the data storage format can be divided into relational-OLAP(ROLAP), multidimensional OLAP(MOLAP)
and hybrid-OLAP(HOLAP) three types.ROLAP MOLAP
R: Use existing relational database technologyM: Especially designed for the on-line analytical processing
R: The response speed is slower than the MOLAP;M: Existing relational database for OLAP already do a lot of optimization, including parallel storage, parallel query, parallel data management, query optimization based on the cost, the bitmap index, resulting an improved performance.
R: Data loading speed is fastM: Data loading speed is slow
R: Storage cost is small, there is no limit to the dimensionM: Need to calculate, may lead to data explosion, dimension is limited; failing to support the dynamic change of dimension
R: Use RDBMS stored data, there is no file size limitM: The file size be limited in the operating system platform (only 10~20 g)
R: Can be realized through SQL detailed data and summary data storageM: Lacking the standards of data model and data access
The realization of the on-line analytical processing has three different methods:
1. ROLAP2. MOLAP3. Front-end display, on-line analytical processing (Desktop OLAP)
Among these, Desktop OLAP needs to download all the data to the client, and then take report format/data structure reorganization on the client. The user can realize the dynamic analysis in the native. The method is more flexible, but it can support the very limited amount of data which seriously affect the scope and efficiency of use. So it has been eliminated now.
You can find some examples below:
HyperionOracleCognosMicroStrategyIBMBrio
Widget: Silverlight Winforms
Based on a CUBE calculation and analysis.
Bibliography
Mailvaganam, Hari. “Introduction to OLAP”. www.dwreview.com/OLAP/Introduction_OLAP.html
Chaudhuri, Surajit; Dayal, Umeshwar; Narasayya, Vivek. “An Overview of Business Intelligence”. Communications Of The ACM, Vol. 54, No. 8, August 2011.