slowly changing measures mathias goller, stefan berger johannes kepler university, austria, 2013...

19
Slowly Changing Measures Mathias Goller, Stefan Berger Johannes Kepler University, Austria, 2013 Data mining Lab. Gahee Lee 2015.07.14

Upload: laura-conley

Post on 19-Jan-2016

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Slowly Changing Measures Mathias Goller, Stefan Berger Johannes Kepler University, Austria, 2013 Mathias Goller, Stefan Berger Johannes Kepler University,

SlowlyChangingMeasures

Mathias Goller, Stefan Berger

Johannes Kepler University, Austria, 2013

Data mining Lab. Gahee Lee

2015.07.14

Page 2: Slowly Changing Measures Mathias Goller, Stefan Berger Johannes Kepler University, Austria, 2013 Mathias Goller, Stefan Berger Johannes Kepler University,

2

contents

introduction

SCD

example scenario

SCM

conclusion

Page 3: Slowly Changing Measures Mathias Goller, Stefan Berger Johannes Kepler University, Austria, 2013 Mathias Goller, Stefan Berger Johannes Kepler University,

3

introduction

DW - star schema (fact, dimension) Fact data : summarized quantitative measurements

Dimension data : qualitative attributes stored along with the mea-sures

Members : The instances of dimensions

compose the multi-dimensional analysis space needed to aggregate the measures of the (hyper-)cube data inside the DW.

OLAP : provide the business analysts with typical operations

roll-up, …

update dimension data (measure change)

SCD : dimensional modeling of DWs

SCM (proposed method)

Page 4: Slowly Changing Measures Mathias Goller, Stefan Berger Johannes Kepler University, Austria, 2013 Mathias Goller, Stefan Berger Johannes Kepler University,

4

SCD : Slowly Changing Dimensions

preserve the referential integrity between facts and dimen-sions supporting changes in dimension members

type 0 ~ type 6

type 1 : overwrite (data update)

001, 이가희 , 010-1234-1234, …

001, 이가희 , 010-1234-5678, …

type 2 : additional member tuple

001, 이가희 , 010-1234-1234, …

01, 001, 이가희 , 010-1234-1234, …

02, 001, 이가희 , 010-1234-5678, …

type 3 : additional old column

001, 이가희 , 010-1234-1234, …

001, 이가희 , 010-1234-1234, 010-1234-5678, …

데이터 업데이트

surrogate key, history 관리 가능

Page 5: Slowly Changing Measures Mathias Goller, Stefan Berger Johannes Kepler University, Austria, 2013 Mathias Goller, Stefan Berger Johannes Kepler University,

5

SCD : Slowly Changing Dimensions

preserve a history

tracing and reconstructing

the measurement functions might change over time

the summarized scores computed from the underlying events

Page 6: Slowly Changing Measures Mathias Goller, Stefan Berger Johannes Kepler University, Austria, 2013 Mathias Goller, Stefan Berger Johannes Kepler University,

6

SCM : Slowly Changing Measures

SCM as an additional DW modeling concept

manage updates to the measure definitions while ensuring consistent measure semantics

handles these updates mostly at the instance level measure definition

prevents incomparable measure

choose the most appropriate options according to the analy-sis requirements (type 0 ~ type 3)

avoid excessive schema update despite regular changes

Page 7: Slowly Changing Measures Mathias Goller, Stefan Berger Johannes Kepler University, Austria, 2013 Mathias Goller, Stefan Berger Johannes Kepler University,

7

example scenario

data : POS data, sentiment data

Measure function : sentiment, net sales

Message score : Search string, ID, Date, Sentiment

Data mart : Brand, Date, Net sales, Sentiment

Page 8: Slowly Changing Measures Mathias Goller, Stefan Berger Johannes Kepler University, Austria, 2013 Mathias Goller, Stefan Berger Johannes Kepler University,

8

example scenario

change (after day X) the internal booking of the discount

opinion word list

changing the net sales semantics

Page 9: Slowly Changing Measures Mathias Goller, Stefan Berger Johannes Kepler University, Austria, 2013 Mathias Goller, Stefan Berger Johannes Kepler University,

9

example scenario

Brand A 는 부정적인 고객들의 평가에 따라 순 매출이 감소 잘못된 판단 , 우연의 일치

Sick 이란 단어가 공통으로 언급되었음에도 불구하고 다른 sentiment value 를 가짐

Page 10: Slowly Changing Measures Mathias Goller, Stefan Berger Johannes Kepler University, Austria, 2013 Mathias Goller, Stefan Berger Johannes Kepler University,

10

SCM

Type 0

Type 1

Type 2

Type 3

Page 11: Slowly Changing Measures Mathias Goller, Stefan Berger Johannes Kepler University, Austria, 2013 Mathias Goller, Stefan Berger Johannes Kepler University,

11

SCM : type 0

Conscious Do-Nothing (overwriting nothing)

unchanged semantics with a changed definition

only use for well-justified untracked changes buzzword problem

Page 12: Slowly Changing Measures Mathias Goller, Stefan Berger Johannes Kepler University, Austria, 2013 Mathias Goller, Stefan Berger Johannes Kepler University,

12

SCM

type 0 cannot solve the problems (measurement function)

strictly be avoided

type 1 (SCD : overwrite old value) the most recent measurement function

ignoring any previous definition

re-compute, re-score

problems are solved : recomputation is rather expensive

Page 13: Slowly Changing Measures Mathias Goller, Stefan Berger Johannes Kepler University, Austria, 2013 Mathias Goller, Stefan Berger Johannes Kepler University,

13

SCM : type 1

initial load (recompute, overwrite measures)

previous state gets lost, only recoverable with retention of operational data

no history needed new preferable scoring function

error correction

Page 14: Slowly Changing Measures Mathias Goller, Stefan Berger Johannes Kepler University, Austria, 2013 Mathias Goller, Stefan Berger Johannes Kepler University,

14

SCM

type 2 (SCD : additional member tuple) flagging, version dimension : added to the cube schema

simplicity : only an additional version tuple is created

no recomputed

reduce the flexibility of OLAP analysis

forbidden the roll-up of the version flag

historical state of a measure is significant

multi version queries comparing several version of the measure are unnecessary

Page 15: Slowly Changing Measures Mathias Goller, Stefan Berger Johannes Kepler University, Austria, 2013 Mathias Goller, Stefan Berger Johannes Kepler University,

15

SCM : type 2

proactive versioning (version flag / dimension)

previous state remains untouched, is fully trackable

Page 16: Slowly Changing Measures Mathias Goller, Stefan Berger Johannes Kepler University, Austria, 2013 Mathias Goller, Stefan Berger Johannes Kepler University,

16

SCM

type 3 (SCD : additional old column) preserve the full history of measures

minimize OLAP queries

old attribute name = new attribute name

both, original and current version needed

infrequent change only

Page 17: Slowly Changing Measures Mathias Goller, Stefan Berger Johannes Kepler University, Austria, 2013 Mathias Goller, Stefan Berger Johannes Kepler University,

17

SCM : type 3

lazy amendment (additional measure attribute)

Page 18: Slowly Changing Measures Mathias Goller, Stefan Berger Johannes Kepler University, Austria, 2013 Mathias Goller, Stefan Berger Johannes Kepler University,

19

conclusion

SCM by documenting the changes in measure semantics

manage changes in measure definitions with a minimum foot-print in existing DW models

handled mostly at the instance level

avoiding major revisions of the physical DW

type 0 ~ type 3

DW designers a set of design solutions

standard ROLAP technology, manageable DW models

Page 19: Slowly Changing Measures Mathias Goller, Stefan Berger Johannes Kepler University, Austria, 2013 Mathias Goller, Stefan Berger Johannes Kepler University,

20

thank you