slowly changing measures mathias goller, stefan berger johannes kepler university, austria, 2013...

Post on 19-Jan-2016

224 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

SlowlyChangingMeasures

Mathias Goller, Stefan Berger

Johannes Kepler University, Austria, 2013

Data mining Lab. Gahee Lee

2015.07.14

2

contents

introduction

SCD

example scenario

SCM

conclusion

3

introduction

DW - star schema (fact, dimension) Fact data : summarized quantitative measurements

Dimension data : qualitative attributes stored along with the mea-sures

Members : The instances of dimensions

compose the multi-dimensional analysis space needed to aggregate the measures of the (hyper-)cube data inside the DW.

OLAP : provide the business analysts with typical operations

roll-up, …

update dimension data (measure change)

SCD : dimensional modeling of DWs

SCM (proposed method)

4

SCD : Slowly Changing Dimensions

preserve the referential integrity between facts and dimen-sions supporting changes in dimension members

type 0 ~ type 6

type 1 : overwrite (data update)

001, 이가희 , 010-1234-1234, …

001, 이가희 , 010-1234-5678, …

type 2 : additional member tuple

001, 이가희 , 010-1234-1234, …

01, 001, 이가희 , 010-1234-1234, …

02, 001, 이가희 , 010-1234-5678, …

type 3 : additional old column

001, 이가희 , 010-1234-1234, …

001, 이가희 , 010-1234-1234, 010-1234-5678, …

데이터 업데이트

surrogate key, history 관리 가능

5

SCD : Slowly Changing Dimensions

preserve a history

tracing and reconstructing

the measurement functions might change over time

the summarized scores computed from the underlying events

6

SCM : Slowly Changing Measures

SCM as an additional DW modeling concept

manage updates to the measure definitions while ensuring consistent measure semantics

handles these updates mostly at the instance level measure definition

prevents incomparable measure

choose the most appropriate options according to the analy-sis requirements (type 0 ~ type 3)

avoid excessive schema update despite regular changes

7

example scenario

data : POS data, sentiment data

Measure function : sentiment, net sales

Message score : Search string, ID, Date, Sentiment

Data mart : Brand, Date, Net sales, Sentiment

8

example scenario

change (after day X) the internal booking of the discount

opinion word list

changing the net sales semantics

9

example scenario

Brand A 는 부정적인 고객들의 평가에 따라 순 매출이 감소 잘못된 판단 , 우연의 일치

Sick 이란 단어가 공통으로 언급되었음에도 불구하고 다른 sentiment value 를 가짐

10

SCM

Type 0

Type 1

Type 2

Type 3

11

SCM : type 0

Conscious Do-Nothing (overwriting nothing)

unchanged semantics with a changed definition

only use for well-justified untracked changes buzzword problem

12

SCM

type 0 cannot solve the problems (measurement function)

strictly be avoided

type 1 (SCD : overwrite old value) the most recent measurement function

ignoring any previous definition

re-compute, re-score

problems are solved : recomputation is rather expensive

13

SCM : type 1

initial load (recompute, overwrite measures)

previous state gets lost, only recoverable with retention of operational data

no history needed new preferable scoring function

error correction

14

SCM

type 2 (SCD : additional member tuple) flagging, version dimension : added to the cube schema

simplicity : only an additional version tuple is created

no recomputed

reduce the flexibility of OLAP analysis

forbidden the roll-up of the version flag

historical state of a measure is significant

multi version queries comparing several version of the measure are unnecessary

15

SCM : type 2

proactive versioning (version flag / dimension)

previous state remains untouched, is fully trackable

16

SCM

type 3 (SCD : additional old column) preserve the full history of measures

minimize OLAP queries

old attribute name = new attribute name

both, original and current version needed

infrequent change only

17

SCM : type 3

lazy amendment (additional measure attribute)

19

conclusion

SCM by documenting the changes in measure semantics

manage changes in measure definitions with a minimum foot-print in existing DW models

handled mostly at the instance level

avoiding major revisions of the physical DW

type 0 ~ type 3

DW designers a set of design solutions

standard ROLAP technology, manageable DW models

20

thank you

top related