on line analytical modeling
TRANSCRIPT
-
8/11/2019 On Line Analytical Modeling
1/74
-
8/11/2019 On Line Analytical Modeling
2/74
17/08/2014 2
What is
On-Line: A process controlled by a computer.
Analytical Processingneeds Analytical Data.
Analytical Data: Data that involve analysis. Analytical Data consist of Business Data.
Business Data: Time, Customers, Sales, Stores,Products, etc.
Business Data
Analytical Data
Analytical ProcessingClient
-
8/11/2019 On Line Analytical Modeling
3/74
-
8/11/2019 On Line Analytical Modeling
4/74
-
8/11/2019 On Line Analytical Modeling
5/74
-
8/11/2019 On Line Analytical Modeling
6/74
Interactive, exploratory analysis ofmultidimensional data to discover patterns
age accid
ents
gender
-
8/11/2019 On Line Analytical Modeling
7/74
-
8/11/2019 On Line Analytical Modeling
8/74
Online analytical processing is a category ofsoftware technology that enables analysts,manager and executives to gain insight intodata through fast consistent, interactive
access in a wide variety of possible views ofinformation that has been transformed fromraw data to reflect the real dimensionality ofthe enterprise as understood by the user.
-
8/11/2019 On Line Analytical Modeling
9/74
Advanced data analysis environment Supports decision making, business modeling,
and operations research activities Characteristics of OLAP
Use multidimensional data analysis techniques Provide advanced database support Provide easy-to-use end-user interfaces Support client/server architecture Facilitate interactive query and complex analysis for the
user
Allow drill down or roll up Ability to perform intricate calculations and comparisons Present result in meaningful ways like chart graphs
-
8/11/2019 On Line Analytical Modeling
10/74
August 17, 2014 Data Mining: Concepts and Techniques 10
OLTP OLAP
users clerk, IT professional knowledge worker
function day to day operations decision support
DB design application-oriented subject-oriented
data current, up-to-datedetailed, flat relational
isolated
historical,summarized, multidimensional
integrated, consolidated
usage repetitive ad-hoc
access read/write
index/hash on prim. key
lots of scans
unit of work short, simple transaction complex query# records accessed tens millions
#users thousands hundreds
DB size 100MB-GB 100GB-TB
metric transaction throughput query throughput, response
-
8/11/2019 On Line Analytical Modeling
11/74
-
8/11/2019 On Line Analytical Modeling
12/74
-
8/11/2019 On Line Analytical Modeling
13/74
Multidimensional conceptual view Transparency Accessibility Consistent reporting performance Client server architecture Generic dimensionality Dynamic sparse matrix handling Multiuser support Unrestricted cross dimensional operations
Intuitive data manipulation Flexible reporting Unlimited dimensions and aggregation levels
-
8/11/2019 On Line Analytical Modeling
14/74
17/08/2014
Theodoros CHRYSAFIS - Academix
s3ctit03 -www.city.academic.gr/academix 14
OLAP Taxonomy
Multi-dimensional OLAP(MOLAP)A k-dimensional matrix based on a non relational storage
structure. Agrawal et al.
Relational OLAP(ROLAP)A relational back-end wherein operations of the data are
translated to relational queries. Agrawal et al.
Hybrid OLAP(HOLAP)Integration of MOLAP and ROLAP.
Desktop OLAP(DOLAP)Provides a specific cube for analysis. Simplified version of
MOLAP or ROLAP.
-
8/11/2019 On Line Analytical Modeling
15/74
OLAP functionality tomultidimensional databases (MDBMS)
Stored data in multidimensional datacube N-dimensional cubes called
hypercubes
Cube cache memory speedsprocessing Affected by how the database system
handles density of data cube calledsparsity
-
8/11/2019 On Line Analytical Modeling
16/74
OLAP functionality
Uses relational DB query tools
Extensions to RDBMS Multidimensional data schema support
Data access language and query performanceoptimized for multidimensional data
Support for very large databases (VLDBs)
-
8/11/2019 On Line Analytical Modeling
17/74
General features Basic features are
Multidimensional analysis Consistent performance Fast response time Drill down and roll up Navigation in and out of details Slice and dice rotation Multiple view modes
Easy scalability Time intelligence
-
8/11/2019 On Line Analytical Modeling
18/74
-
8/11/2019 On Line Analytical Modeling
19/74
-
8/11/2019 On Line Analytical Modeling
20/74
-
8/11/2019 On Line Analytical Modeling
21/74
-
8/11/2019 On Line Analytical Modeling
22/74
Three-
DimensionalCubeDisplay
Page Columns
Region:
North
Sales
Red
blob
Blue
blob
Total
1996
Rows 1997
Year Total
-
8/11/2019 On Line Analytical Modeling
23/74
Six-Dimensional
Cube
Dimension Example
Brand Mt. Airy
Store Atlanta
Customer segment Business
Product group Desks
Period January
Variable Units sold
-
8/11/2019 On Line Analytical Modeling
24/74
MDS structure A hypercube is general metaphor for
representing multidimensional data
-
8/11/2019 On Line Analytical Modeling
25/74
-
8/11/2019 On Line Analytical Modeling
26/74
Region Sales variance
Africa 105%
Asia 57%
Europe 122%
North America 97%
Pacific 85%
South America 163%
Nation Sales variance
China 123%Japan 52%
India 87%
Singapore 95%
-
8/11/2019 On Line Analytical Modeling
27/74
Just a snippet from http://www.olapreport.com/ProductsIndex.htm; not an end
http://www.olapreport.com/ProductsIndex.htmhttp://www.olapreport.com/ProductsIndex.htm -
8/11/2019 On Line Analytical Modeling
28/74
Advance Database Techniques 28
The database is stored in a special structure that isoptimized for multidimensional analysis.
Data is aggregated and stored according to predicted usage
Very fast query response time as data is mostly pre-calculated
Systems are best used when data is desired for a specificapplication
Tight Coupling between application and presentation layer
MOLAP
-
8/11/2019 On Line Analytical Modeling
29/74
Advance Database Techniques 29
Practical limit on the size- time taken to calculate the database & the
space - required to hold these pre-calculatedvalues
- Good for smaller storage space (< 50 GB)
Navigation of Data is limited
Costly to maintain
Does not scale well
MOLAP
-
8/11/2019 On Line Analytical Modeling
30/74
-
8/11/2019 On Line Analytical Modeling
31/74
-
8/11/2019 On Line Analytical Modeling
32/74
Advance Database Techniques 32
Advantages
Excellent performance:MOLAP cubes are built for fast data retrieval, and is
optimal for slicing and dicing operations.
Can perform complex calculations:
All calculations have been pre-generated when the
cube is created. Hence, complex calculations are not onlydoable, but they return quickly.
-
8/11/2019 On Line Analytical Modeling
33/74
Advance Database Techniques 33
Disadvantages
:
Handles limited data :
Because all calculations are performed whenthe cube is built, it is not possible to include a largeamount of data in the cube itself.
Requires additional investment :
Cube technology are often proprietary and donot already exist in the organization. Therefore, toadopt MOLAP technology chances, additionalinvestments in human and capital resources are
needed.
MOLAP
-
8/11/2019 On Line Analytical Modeling
34/74
Advance Database Techniques 34
ROLAP is an alternative to the MOLAP technology.
ROLAP differs significantly in that it does notrequire the pre-computation and storage ofinformation.
ROLAP tools access the data in a relationaldatabase and generate SQL queries to calculateinformation at the appropriate level when an Enduser requests it
It is possible to create additionaldatabase(summary tables and aggregation) tableswhich is summarize the data at any desiredcombination of dimensions.
-
8/11/2019 On Line Analytical Modeling
35/74
Advance Database Techniques 35
The database is a standard relationaldatabase and the database model is a
multidimensional model, often referred toas a star or snowflake model or schema.
-
8/11/2019 On Line Analytical Modeling
36/74
17/08/2014
Theodoros CHRYSAFIS - Academix
s3ctit03 -www.city.academic.gr/academix 36
ROLAP
A multi-dimensional user view on relationaldata storage using Star or SnowflakeDatabase Schemata.
ProductDimension
TimeDimension
RegionDimension
CustomerDimension
ProductDimension
YearDimension
CountryDimension
CustomerDimension
Sales
CustomerCharacteristics
ProductKind
Region
Month
Snowflake
Schema
Sales
Star Schema
-
8/11/2019 On Line Analytical Modeling
37/74
17/08/2014
Theodoros CHRYSAFIS - Academix
s3ctit03 -www.city.academic.gr/academix 37
ROLAP
Advantages: Easy to understand, easy tomodel, easy to implement.
Further Research on dynamic optimisation, onmeta-models, on functional extensions forthe ROLAP engines, on user-definedfunctions for the OLAP.
-
8/11/2019 On Line Analytical Modeling
38/74
-
8/11/2019 On Line Analytical Modeling
39/74
Advance Database Techniques 39
Advantages:
Can handle large amounts of data:The data size limitation of ROLAP technology
is the limitation on data size of the underlyingrelational database. In other words, ROLAP itself
places no limitation on data amount. Can leverage functionalities inherent in the
relational database:
Often, relational database already comes witha host of functionalities. ROLAP technologies,since they sit on top of the relational database,can therefore leverage these functionalities.
Easy to understand, easy to model, easy to
implement.
-
8/11/2019 On Line Analytical Modeling
40/74
Advance Database Techniques 40
Disadvantages:
Performance can be slow:
Because each ROLAP report is essentially a SQLquery (or multiple SQL queries) in the relational database,the query time can be long if the underlying data size is
large. Limited by SQL functionalities:
Because ROLAP technology mainly relies ongenerating SQL statements to query the relational
database, and SQL statements do not fit all needs (forexample, it is difficult to perform complex calculationsusing SQL), ROLAP technologies are therefore traditionallylimited by what SQL can do. ROLAP vendors have mitigatedthis risk by building into the tool out-of-the-box complexfunctions as well as the ability to allow users to define their
own functions.
ROLAP
-
8/11/2019 On Line Analytical Modeling
41/74
Advance Database Techniques 41
ROLAP v/s MOLAP
AND
HOLAP
-
8/11/2019 On Line Analytical Modeling
42/74
-
8/11/2019 On Line Analytical Modeling
43/74
-
8/11/2019 On Line Analytical Modeling
44/74
-
8/11/2019 On Line Analytical Modeling
45/74
Advance Database Techniques 45
a hybrid of ROLAP and MOLAP can be thought of as a virtual database
whereby the higher levels of the database areimplemented as MOLAP and the lower levels of
the database as ROLAP
HOLAP
-
8/11/2019 On Line Analytical Modeling
46/74
Advance Database Techniques 46
A system, which supports (and integrates)multi-dimensional and relational storage fordata in an equivalent manner in order tobenefit from the corresponding characteristics
and optimization techniques. Advantages:
use of best techniques introduced onMOLAP and ROLAP, transparency betweenMOLAP and ROLAP systems.
HOL P Contd
-
8/11/2019 On Line Analytical Modeling
47/74
Advance Database Techniques 47
Development Issues
Results in lots of data redundancy
It allows users to build custom cubes causing data
inconsistencies Only limited amounts of Data can be maintained
efficiently
Almost all systems utilize HOLAP to some
respects
HOL P Contd
-
8/11/2019 On Line Analytical Modeling
48/74
Advance Database Techniques 48
DOLAP Desktop OLAP)
The previous terms are used to refer to server based OLAPtechnologies
DOLAP enables users to quickly pull together small cubes that run ontheir desktops or laptops
-
8/11/2019 On Line Analytical Modeling
49/74
-
8/11/2019 On Line Analytical Modeling
50/74
-
8/11/2019 On Line Analytical Modeling
51/74
-
8/11/2019 On Line Analytical Modeling
52/74
2014.08.17.OLAP operations 52
-
8/11/2019 On Line Analytical Modeling
53/74
2014.08.17.OLAP operations 53
Roll up (drill-up):summarizedata
by climbing up
hierarchy or by
dimension reduction
Drill down (roll down):reverse
of roll-up
from higher level
summary to lower levelsummary or detailed
data, or introducing
new dimensions
-
8/11/2019 On Line Analytical Modeling
54/74
2014.08.17.OLAP operations 54
OLAP operations II.
Slice and dice:
project and select
Pivot (rotate):
reorient the cube,
visualization, 3D to
series of 2D planes.
Other operations drill across:
involving (across)
more than one fact
table
drill through:
through the bottom
level of the cube to
its back-end
relational tables
-
8/11/2019 On Line Analytical Modeling
55/74
55
sale prodId storeId date amt
p1 c1 1 12
p2 c1 1 11
p1 c3 1 50
p2 c2 1 8
p1 c1 2 44
p1 c2 2 4
Add up amounts for day 1In SQL: SELECT sum(amt) FROM SALE
WHERE date = 1
81
-
8/11/2019 On Line Analytical Modeling
56/74
56
sale prodId storeId date amt
p1 c1 1 12
p2 c1 1 11
p1 c3 1 50
p2 c2 1 8
p1 c1 2 44
p1 c2 2 4
Add up amounts by dayIn SQL: SELECT date, sum(amt) FROM SALE
GROUP BY date
ans date sum
1 81
2 48
-
8/11/2019 On Line Analytical Modeling
57/74
57
sale prodId storeId date amt
p1 c1 1 12
p2 c1 1 11
p1 c3 1 50
p2 c2 1 8
p1 c1 2 44
p1 c2 2 4
Add up amounts by day, productIn SQL: SELECT date, sum(amt) FROM SALE
GROUP BY date, prodId
sale prodId date amt
p1 1 62
p2 1 19
p1 2 48
drill-down
rollup
-
8/11/2019 On Line Analytical Modeling
58/74
58
Operators: sum, count, max, min,median, avg
Having clause
Using dimension hierarchy average by region (within store)
maximum by month (within date)
-
8/11/2019 On Line Analytical Modeling
59/74
-
8/11/2019 On Line Analytical Modeling
60/74
60
day 2
c1 c2 c3
p1 44 4
p2 c1 c2 c3
p1 12 50
p2 11 8
day 1
c1 c2 c3
p1 56 4 50
p2 11 8
c1 c2 c3
sum 67 12 50
sum
p1 110
p2 19
129
. . .
sale c1,*,*)
sale *,*,*)ale c2,p2,*)
-
8/11/2019 On Line Analytical Modeling
61/74
61
c1 c2 c3 *
p1 56 4 50 110
p2 11 8 19
* 67 12 50 129ay 2 c1 c2 c3 *
p1 44 4 48
p2
* 44 4 48c1 c2 c3 *
p1 12 50 62
p2 11 8 19
* 23 8 50 81
day 1
*
sale *,p2,*)
-
8/11/2019 On Line Analytical Modeling
62/74
62
day 2
c1 c2 c3
p1 44 4
p2 c1 c2 c3
p1 12 50p2 11 8
day 1
region A region B
p1 56 54p2 11 8
customer
region
country
(customer c1 in Region A;customers c2, c3 in Region B)
-
8/11/2019 On Line Analytical Modeling
63/74
63
sale prodId storeId date amt
p1 c1 1 12
p2 c1 1 11
p1 c3 1 50
p2 c2 1 8
p1 c1 2 44
p1 c2 2 4
day 2
c1 c2 c3
p1 44 4
p2 c1 c2 c3p1 12 50
p2 11 8
day 1
Multi-dimensional cube:Fact table view:
c1 c2 c3
p1 56 4 50
p2 11 8
-
8/11/2019 On Line Analytical Modeling
64/74
Advance Database Techniques 64
Slicing is selecting a group of cells fromthe entire multidimensional array byspecifying a specific value for one or moredimensions.
Dicing involves selecting a subset of cells byspecifying a range of attribute values.
This is equivalent to defining a subarray from the complete array.
In practice, both operations can also beaccompanied by aggregation over somedimensions.
Slicing and Dicing
-
8/11/2019 On Line Analytical Modeling
65/74
-
8/11/2019 On Line Analytical Modeling
66/74
-
8/11/2019 On Line Analytical Modeling
67/74
-
8/11/2019 On Line Analytical Modeling
68/74
68
20
23
1819
20
21
22
23
25
26
id name age
1 joe 20
2 fred 20
3 sally 21
4 nancy 205 tom 20
6 pat 25
7 dave 21
8 jeff 26
ageindex
datarecords
-
8/11/2019 On Line Analytical Modeling
69/74
-
8/11/2019 On Line Analytical Modeling
70/74
70
sale prodId storeId date amt
p1 c1 1 12
p2 c1 1 11p1 c3 1 50
p2 c2 1 8
p1 c1 2 44
p1 c2 2 4
Combine SALE, PRODUCT relationsIn SQL: SELECT * FROM SALE, PRODUCT
product id name price
p1 bolt 10
p2 nut 5
joinTb prodId name price storeId date amt
p1 bolt 10 c1 1 12p2 nut 5 c1 1 11
p1 bolt 10 c3 1 50
p2 nut 5 c2 1 8
p1 bolt 10 c1 2 44
p1 bolt 10 c2 2 4
-
8/11/2019 On Line Analytical Modeling
71/74
71
product id name price jIndex
p1 bolt 10 r1,r3,r5,r6
p2 nut 5 r2,r4
sale rId prodId storeId date amt
r1 p1 c1 1 12
r2 p2 c1 1 11
r3 p1 c3 1 50
r4 p2 c2 1 8r5 p1 c1 2 44
r6 p1 c2 2 4
join index
-
8/11/2019 On Line Analytical Modeling
72/74
Bitmapped join index
File organisation
-
8/11/2019 On Line Analytical Modeling
73/74
Web based OLAP
Web OLAP approaches Browser plug ins
Precreated HTML documents OLAP in the server
-
8/11/2019 On Line Analytical Modeling
74/74