slowly changing measures mathias goller, stefan berger johannes kepler university, austria, 2013...
TRANSCRIPT
SlowlyChangingMeasures
Mathias Goller, Stefan Berger
Johannes Kepler University, Austria, 2013
Data mining Lab. Gahee Lee
2015.07.14
2
contents
introduction
SCD
example scenario
SCM
conclusion
3
introduction
DW - star schema (fact, dimension) Fact data : summarized quantitative measurements
Dimension data : qualitative attributes stored along with the mea-sures
Members : The instances of dimensions
compose the multi-dimensional analysis space needed to aggregate the measures of the (hyper-)cube data inside the DW.
OLAP : provide the business analysts with typical operations
roll-up, …
update dimension data (measure change)
SCD : dimensional modeling of DWs
SCM (proposed method)
4
SCD : Slowly Changing Dimensions
preserve the referential integrity between facts and dimen-sions supporting changes in dimension members
type 0 ~ type 6
type 1 : overwrite (data update)
001, 이가희 , 010-1234-1234, …
001, 이가희 , 010-1234-5678, …
type 2 : additional member tuple
001, 이가희 , 010-1234-1234, …
01, 001, 이가희 , 010-1234-1234, …
02, 001, 이가희 , 010-1234-5678, …
type 3 : additional old column
001, 이가희 , 010-1234-1234, …
001, 이가희 , 010-1234-1234, 010-1234-5678, …
데이터 업데이트
surrogate key, history 관리 가능
5
SCD : Slowly Changing Dimensions
preserve a history
tracing and reconstructing
the measurement functions might change over time
the summarized scores computed from the underlying events
6
SCM : Slowly Changing Measures
SCM as an additional DW modeling concept
manage updates to the measure definitions while ensuring consistent measure semantics
handles these updates mostly at the instance level measure definition
prevents incomparable measure
choose the most appropriate options according to the analy-sis requirements (type 0 ~ type 3)
avoid excessive schema update despite regular changes
7
example scenario
data : POS data, sentiment data
Measure function : sentiment, net sales
Message score : Search string, ID, Date, Sentiment
Data mart : Brand, Date, Net sales, Sentiment
8
example scenario
change (after day X) the internal booking of the discount
opinion word list
changing the net sales semantics
9
example scenario
Brand A 는 부정적인 고객들의 평가에 따라 순 매출이 감소 잘못된 판단 , 우연의 일치
Sick 이란 단어가 공통으로 언급되었음에도 불구하고 다른 sentiment value 를 가짐
10
SCM
Type 0
Type 1
Type 2
Type 3
11
SCM : type 0
Conscious Do-Nothing (overwriting nothing)
unchanged semantics with a changed definition
only use for well-justified untracked changes buzzword problem
12
SCM
type 0 cannot solve the problems (measurement function)
strictly be avoided
type 1 (SCD : overwrite old value) the most recent measurement function
ignoring any previous definition
re-compute, re-score
problems are solved : recomputation is rather expensive
13
SCM : type 1
initial load (recompute, overwrite measures)
previous state gets lost, only recoverable with retention of operational data
no history needed new preferable scoring function
error correction
14
SCM
type 2 (SCD : additional member tuple) flagging, version dimension : added to the cube schema
simplicity : only an additional version tuple is created
no recomputed
reduce the flexibility of OLAP analysis
forbidden the roll-up of the version flag
historical state of a measure is significant
multi version queries comparing several version of the measure are unnecessary
15
SCM : type 2
proactive versioning (version flag / dimension)
previous state remains untouched, is fully trackable
16
SCM
type 3 (SCD : additional old column) preserve the full history of measures
minimize OLAP queries
old attribute name = new attribute name
both, original and current version needed
infrequent change only
17
SCM : type 3
lazy amendment (additional measure attribute)
19
conclusion
SCM by documenting the changes in measure semantics
manage changes in measure definitions with a minimum foot-print in existing DW models
handled mostly at the instance level
avoiding major revisions of the physical DW
type 0 ~ type 3
DW designers a set of design solutions
standard ROLAP technology, manageable DW models
20
thank you