olap mining: mining multidimensional data
DESCRIPTION
TRANSCRIPT
Introduction Mining for Blocks Multiple-Level Multidimensional Sequential Patterns Conclusion and perspectives
OLAP Mining: Mining Multidimensional Data
EXPEDO
LIRMM, UNIVERSITÉ MONTPELLIER II, FRANCEETIS, UNIVERSITÉ CERGY-PONTOISE, FRANCE
HELP UC, KUALA LUMPUR, MALAYSIA
Feb. 20-21 2007
EXPEDO LIRMM-ETIS-HELP UC
OLAP Mining
Introduction Mining for Blocks Multiple-Level Multidimensional Sequential Patterns Conclusion and perspectives
Outline1 Introduction
OLAP and Data MiningResearch Topics on OLAP Mining (EXPEDO)
2 Mining for BlocksFuzzy and Crisp BlocksGenerating BlocksManaging HierarchiesVisualizing Blocks
3 Multiple-Level Multidimensional Sequential PatternsMultidimensional Sequential PatternsMultiple Level MSPImplementation
4 Conclusion and perspectivesEXPEDO LIRMM-ETIS-HELP UC
OLAP Mining
Introduction Mining for Blocks Multiple-Level Multidimensional Sequential Patterns Conclusion and perspectives
Outline1 Introduction
OLAP and Data MiningResearch Topics on OLAP Mining (EXPEDO)
2 Mining for BlocksFuzzy and Crisp BlocksGenerating BlocksManaging HierarchiesVisualizing Blocks
3 Multiple-Level Multidimensional Sequential PatternsMultidimensional Sequential PatternsMultiple Level MSPImplementation
4 Conclusion and perspectivesEXPEDO LIRMM-ETIS-HELP UC
OLAP Mining
Introduction Mining for Blocks Multiple-Level Multidimensional Sequential Patterns Conclusion and perspectives
OLAP and KDD
OLTP vs. OLAP
OLAP UsersDecision makersComplex Queries
Current UsesOLAP framework : mainly provides navigation andreporting tools (pull)Need for Data Mining (push)
EXPEDO LIRMM-ETIS-HELP UC
OLAP Mining
Introduction Mining for Blocks Multiple-Level Multidimensional Sequential Patterns Conclusion and perspectives
OLAP and KDD
OLAP MiningFirst introduced in 1997 by Jiawei Han asa mechanism which integrates OLAP with data mining so thatmining can be performed in different portions of databases ordata warehouses and at different levels of abstraction at user’sfinger tips
EXPEDO LIRMM-ETIS-HELP UC
OLAP Mining
Introduction Mining for Blocks Multiple-Level Multidimensional Sequential Patterns Conclusion and perspectives
OLAP and KDD
Specificities of the OLAP Framework
On-line analysismeasures described by means of dimensionsaggregated measure valueshierarchiesdisplaying data : the order matters (switch, pivot)
EXPEDO LIRMM-ETIS-HELP UC
OLAP Mining
Introduction Mining for Blocks Multiple-Level Multidimensional Sequential Patterns Conclusion and perspectives
OLAP and KDD
Motivating Example
Beer Water Soda Wine MilkEurope 4 4 7 6 5America 4 5 7 7 6Asia 3 3 6 5 5Africa 2 2 6 5 4
Beer Water Milk Wine SodaAmerica 4 5 6 7 7Europe 4 4 5 6 7Asia 3 3 5 5 6Africa 2 2 4 5 6
EXPEDO LIRMM-ETIS-HELP UC
OLAP Mining
Introduction Mining for Blocks Multiple-Level Multidimensional Sequential Patterns Conclusion and perspectives
OLAP and KDD
Representing Cubes
Several Ways to represent the same dataFinding the best representations is known as beingNP-Hard
EXPEDO LIRMM-ETIS-HELP UC
OLAP Mining
Introduction Mining for Blocks Multiple-Level Multidimensional Sequential Patterns Conclusion and perspectives
Hierarchies
Using Hierarchies
Representativity of extracted Knowledge
high : nothing can be extracted (and trivial knowledge)low : too many patterns extracted, no use for the decisionmakers
Difficulty to choose the best level of granularity to getrelevant knowledge
Taking Hierarchies into account
Extracting rules at different levels of hierarchiesSubrules are automatically discovered (thanks toanti-monotonicity)
EXPEDO LIRMM-ETIS-HELP UC
OLAP Mining
Introduction Mining for Blocks Multiple-Level Multidimensional Sequential Patterns Conclusion and perspectives
Research Topics on OLAP Mining (EXPEDO)
Research Topics from EXPEDO
Topics addressed by the projectMining for Rules (e.g. association rules, gradual rules,sequential patterns)Mining for homogeneous parts and compressing (e.g.blocks)Navigating by means of intelligent queries
To be addressed in this talkMining for BlocksMining for Multidimensional Sequential Patterns
EXPEDO LIRMM-ETIS-HELP UC
OLAP Mining
Introduction Mining for Blocks Multiple-Level Multidimensional Sequential Patterns Conclusion and perspectives
Outline1 Introduction
OLAP and Data MiningResearch Topics on OLAP Mining (EXPEDO)
2 Mining for BlocksFuzzy and Crisp BlocksGenerating BlocksManaging HierarchiesVisualizing Blocks
3 Multiple-Level Multidimensional Sequential PatternsMultidimensional Sequential PatternsMultiple Level MSPImplementation
4 Conclusion and perspectivesEXPEDO LIRMM-ETIS-HELP UC
OLAP Mining
Introduction Mining for Blocks Multiple-Level Multidimensional Sequential Patterns Conclusion and perspectives
Why Blocks ?
Impossibe to Mine for the Best RepresentationDifferent kinds of relevant representationsOther criteria may be considered : pointing outhomogeneous parts
What are Blocks ?Blocks are subcubes defined over all dimensionssome dimensions may appear completely : ALL levelBlocks must be large enough (Support)Blocks must be homogeneous enough (Confidence)
EXPEDO LIRMM-ETIS-HELP UC
OLAP Mining
Introduction Mining for Blocks Multiple-Level Multidimensional Sequential Patterns Conclusion and perspectives
����������������������������������������������������������������������
����������������������������������������������������������������������
����������������������������������������������������������������������
����������������������������������������������������������������������
������������������������������������������������������������
������������������������������������������������������������
������������������������������������
������������������������������������
P1
P2
P3
P4
C1 C2 C3 C4 C5 C6 CITY
PRODUCT
6 6
6
8
8 8
5
8
8
5
5
8 2
2
5
5 5
6
2
2 2
8
75
2
Block Valuethe number of measure values may be numerous, thuspreventing from discovering blocks
EXPEDO LIRMM-ETIS-HELP UC
OLAP Mining
Introduction Mining for Blocks Multiple-Level Multidimensional Sequential Patterns Conclusion and perspectives
Fuzzy and Crisp Blocks
Partitioning the measure : Crisp Blocks
6
6.1
5.9 0 10
0 2 5 8 10
8
58.1
7.9
7.8 4.8
4.75.1
8.1
4.9
8.2
2.4
2.2
5
5.3
1.8
1.9
EXPEDO LIRMM-ETIS-HELP UC
OLAP Mining
Introduction Mining for Blocks Multiple-Level Multidimensional Sequential Patterns Conclusion and perspectives
Fuzzy and Crisp Blocks
Partitioning the measure : Fuzzy Blocks
6
6.1
5.9
8
58.1
7.9
7.8 4.8
4.75.1
8.1
4.9
8.2
2.4
2.2
5
5.3
1.8
1.9
0 10
0 2 5 8 10
EXPEDO LIRMM-ETIS-HELP UC
OLAP Mining
Introduction Mining for Blocks Multiple-Level Multidimensional Sequential Patterns Conclusion and perspectives
Generating Blocks
Given a k -dimensional cube C, a support threshold σ and aconfidence threshold γ.For each measure values m :
1 For every dimension, compute all maximal intervals ofvalues containing enough matching measure values m
2 Combine the intervals in a level wise manner3 Considering the set of all blocks computed in the previous
step, sort out those that are not minimal with respect to theinclusion ordering and then those having a confidence form less than or equal to γ.
EXPEDO LIRMM-ETIS-HELP UC
OLAP Mining
Introduction Mining for Blocks Multiple-Level Multidimensional Sequential Patterns Conclusion and perspectives
Example
# 82
1
0
# 812 03
P1
P2
P3
P4
C1 C2 C3 C4 C5 C6
PRODUCT
6 6 5 5 2
7565586
8
8 8
5 5
8 2
2 2
2 2
8
8
EXPEDO LIRMM-ETIS-HELP UC
OLAP Mining
Introduction Mining for Blocks Multiple-Level Multidimensional Sequential Patterns Conclusion and perspectives
Problem
On real data :it may be the case that no (or very few) slice is relevantregarding the support : not enough cell with the measure valuebeing consideredAlternatives :
Decrease the minimum support valueMerge slicesNote that, in this case, considering hierarchies leads tosemantically-founded merged slicesNote that the support must remain anti-monotonic
EXPEDO LIRMM-ETIS-HELP UC
OLAP Mining
Introduction Mining for Blocks Multiple-Level Multidimensional Sequential Patterns Conclusion and perspectives
Managing Hierarchies
water
soda
butter
bread
City1
City2City3City4City5City6
FoodBeverage
7.8
8.28.4
7.9
8.3
4.818
12
6.2
4.1
2.9
8
17
14
12.8
water
soda
butter
bread
City1
City2City3City4City5City6
7.8
8.28.4
7.9
8.3
4.818
12
6.2
4.1
2.9
8
17
14
12.8
South
North
EXPEDO LIRMM-ETIS-HELP UC
OLAP Mining
Introduction Mining for Blocks Multiple-Level Multidimensional Sequential Patterns Conclusion and perspectives
Visualizing Blocks
Visualizing Blocks
Users can only have 2D (possibly 3D) visions of their dataAnd thus use projections over some values on theremaining invisible dimensions ...... But they are interested to know about the rest
− > Coloring cells depending on the block informations
EXPEDO LIRMM-ETIS-HELP UC
OLAP Mining
Introduction Mining for Blocks Multiple-Level Multidimensional Sequential Patterns Conclusion and perspectives
Visualizing Blocks
Example
EXPEDO LIRMM-ETIS-HELP UC
OLAP Mining
Introduction Mining for Blocks Multiple-Level Multidimensional Sequential Patterns Conclusion and perspectives
Visualizing Blocks
INTERLUDE
INTERLUDEon SEQUENTIAL PATTERNS
EXPEDO LIRMM-ETIS-HELP UC
OLAP Mining
Introduction Mining for Blocks Multiple-Level Multidimensional Sequential Patterns Conclusion and perspectives
Outline1 Introduction
OLAP and Data MiningResearch Topics on OLAP Mining (EXPEDO)
2 Mining for BlocksFuzzy and Crisp BlocksGenerating BlocksManaging HierarchiesVisualizing Blocks
3 Multiple-Level Multidimensional Sequential PatternsMultidimensional Sequential PatternsMultiple Level MSPImplementation
4 Conclusion and perspectivesEXPEDO LIRMM-ETIS-HELP UC
OLAP Mining
Introduction Mining for Blocks Multiple-Level Multidimensional Sequential Patterns Conclusion and perspectives
Sequential Patterns
Relations between eventsPartial Order on the data (e.g. temporal order)Many applications : marketing-CRM, decision making,bioinformatics, . . .
§Sequential Patterns only use a small part of the dataavailable (single dimension)What about the other dimensions ?
EXPEDO LIRMM-ETIS-HELP UC
OLAP Mining
Introduction Mining for Blocks Multiple-Level Multidimensional Sequential Patterns Conclusion and perspectives
Goal : Combining several dimensions in the patterns
an item from a sequence is defined over severaldimensions
classical item : cmultidimensional item : (Pakistan, c)
multidimensional sequence :
EXPEDO LIRMM-ETIS-HELP UC
OLAP Mining
Introduction Mining for Blocks Multiple-Level Multidimensional Sequential Patterns Conclusion and perspectives
Goal : Combining several dimensions in the patterns
an item from a sequence is defined over severaldimensionsclassical item : c
multidimensional item : (Pakistan, c)
multidimensional sequence :
EXPEDO LIRMM-ETIS-HELP UC
OLAP Mining
Introduction Mining for Blocks Multiple-Level Multidimensional Sequential Patterns Conclusion and perspectives
Goal : Combining several dimensions in the patterns
an item from a sequence is defined over severaldimensionsclassical item : cmultidimensional item : (Pakistan, c)
multidimensional sequence :
EXPEDO LIRMM-ETIS-HELP UC
OLAP Mining
Introduction Mining for Blocks Multiple-Level Multidimensional Sequential Patterns Conclusion and perspectives
Goal : Combining several dimensions in the patterns
an item from a sequence is defined over severaldimensionsclassical item : cmultidimensional item : (Pakistan, c)
multidimensional sequence :
〈{(Pakistan, carpet1), (Pakistan, pashmina1)}{(France, carpet1)}〉instead of simply 〈(carpet1, pashmina1), carpet1〉Warning : clients are usually groups
EXPEDO LIRMM-ETIS-HELP UC
OLAP Mining
Introduction Mining for Blocks Multiple-Level Multidimensional Sequential Patterns Conclusion and perspectives
Multidimensional Sequential Patterns
Data Model : Blocks again !
DB partitioned into blocks regarding some dimensions, e.g.customer group. A block is considered as a client.
BlocID Date Place Product1 January Pakistan c11 January Pakistan c21 January Pakistan p11 March France c11 March Pakistan c12 February UK p22 June Pakistan c12 June Pakistan p22 July France c13 April Pakistan p13 April Pakistan c13 September France c1
EXPEDO LIRMM-ETIS-HELP UC
OLAP Mining
Introduction Mining for Blocks Multiple-Level Multidimensional Sequential Patterns Conclusion and perspectives
Multidimensional Sequential Patterns
Dimension Partition
D = DF ⊕DR ⊕DA ⊕Dt
Dt : temporal dimensionsDA : analysis dimensionsDR : reference dimensionsDF : forgotten dimensions
EXPEDO LIRMM-ETIS-HELP UC
OLAP Mining
Introduction Mining for Blocks Multiple-Level Multidimensional Sequential Patterns Conclusion and perspectives
Multidimensional Sequential Patterns
Support of a sequence
considering a particular sequence ς
support computed over reference dimensions
Support of ς
support(ς) = number of blocks supporting ςnumber of blocks
EXPEDO LIRMM-ETIS-HELP UC
OLAP Mining
Introduction Mining for Blocks Multiple-Level Multidimensional Sequential Patterns Conclusion and perspectives
Multidimensional Sequential Patterns
Example
DR = {Bid}, DA = {Place, Product} and DT = {Date},minsupp = 2Compute support ofς = 〈{(Pakistan, c1), (Pakistan, p1)}{(France, c1)}〉
EXPEDO LIRMM-ETIS-HELP UC
OLAP Mining
Introduction Mining for Blocks Multiple-Level Multidimensional Sequential Patterns Conclusion and perspectives
Multidimensional Sequential Patterns
ς = 〈{(Pakistan, c1), (Pakistan, p1)}{(France, c1)}〉
Block 11 January Pakistan c11 January Pakistan c21 January Pakistan p11 March France c11 March Pakistan c1
Block 1 supports ς : support(ς) + +
Block 2Block 2 does not support ς :
2 February UK p22 June Pakistan c12 June Pakistan p22 July France c1
EXPEDO LIRMM-ETIS-HELP UC
OLAP Mining
Introduction Mining for Blocks Multiple-Level Multidimensional Sequential Patterns Conclusion and perspectives
Multidimensional Sequential Patterns
ς = 〈{(Pakistan, c1), (Pakistan, p1)}{(France, c1)}〉
block 33 April Pakistan p13 April Pakistan c13 2 France c1
block 3 supports ς : support(ς) + +
support(ς) = 2 ≥ minsupp
ς is frequent
EXPEDO LIRMM-ETIS-HELP UC
OLAP Mining
Introduction Mining for Blocks Multiple-Level Multidimensional Sequential Patterns Conclusion and perspectives
Multiple Level MSP
Need for Hierarchies
what if minsup=3 ? ς would not have been frequentUsing all the dimensions at the lowest granularity level maylead to ... nothingthen necessary to consider aggregationthe dimension may then be rolled up, or simply ignored(ALL level)ς ′ =〈{(Pakistan, c1), (Pakistan, pashmina)}{(France, c1)}〉
EXPEDO LIRMM-ETIS-HELP UC
OLAP Mining
Introduction Mining for Blocks Multiple-Level Multidimensional Sequential Patterns Conclusion and perspectives
Multiple Level MSP
Managing Hierarchies
describing hierarchies between elementsonly the leaves can appear in the DB
Example of hierarchies on dimensions PRODUCT :
c1 c2 ... p1 p2 ... ...
carpet pashmina
Product (ALL)
EXPEDO LIRMM-ETIS-HELP UC
OLAP Mining
Introduction Mining for Blocks Multiple-Level Multidimensional Sequential Patterns Conclusion and perspectives
Multiple Level MSP
Agrawal & Srikant (1996) :taking hierachies into account.the database is rewritten by putting the itemstogether with their ancestors (not possible ifseveral dimensions with several levels)
J. Han (2001) :Extraction of knowledge level by levelbut the mined knowledge concerns only onelevel at one time
Choong et al. (2005) :many dimension appear in the patternsthey can appear at any granularity level
EXPEDO LIRMM-ETIS-HELP UC
OLAP Mining
Introduction Mining for Blocks Multiple-Level Multidimensional Sequential Patterns Conclusion and perspectives
Multiple Level MSP
Item, itemset and h-generalized sequences
Multidimensional h-generalized Item :tuple with labels taken at any ganularity levelExamples : (Pakistan, c2)
(EU, p1)
(EU, pashmina)
EXPEDO LIRMM-ETIS-HELP UC
OLAP Mining
Introduction Mining for Blocks Multiple-Level Multidimensional Sequential Patterns Conclusion and perspectives
Multiple Level MSP
Item, itemset and h-generalized sequences
Multidimensional h-generalized Item :tuple with labels taken at any ganularity levelExamples : (Pakistan, c2)
(EU, p1)
(EU, pashmina)
Hierarchical InclusionGiven e = (d1, . . . , dm) and e′ = (d ′
1, . . . , d ′m), e can be :
more general than e′ (e >h e′)more specific than e′ (e <h e′)not comparable to e′ (e ≯h e′ and e′ ≯h e)
EXPEDO LIRMM-ETIS-HELP UC
OLAP Mining
Introduction Mining for Blocks Multiple-Level Multidimensional Sequential Patterns Conclusion and perspectives
Multiple Level MSP
Example :
(Pakistan, carpet) >h (Pakistan, c1).(France, c4) <h (EU, carpet).(EU, c1) and (France, carpet) are not comparable(France, c1) and (Pakistan, c1) are not comparable
EXPEDO LIRMM-ETIS-HELP UC
OLAP Mining
Introduction Mining for Blocks Multiple-Level Multidimensional Sequential Patterns Conclusion and perspectives
Multiple Level MSP
Itemset and h-generalized sequence
h-generalized Itemset :
set of non comparable items
{(France, c1), (USA, p1)} YES{(France, c1), (France, carpet)} NO because(France, c1) <h (France, carpet)
Multidimensional h-generalized Sequence
s = 〈i1, . . . , ij〉 is an ordered non empty list of multidimensionalh-generalized itemsets.
〈{(India, c1), (Pakistan, c1)}{(EU, c1)}〉
EXPEDO LIRMM-ETIS-HELP UC
OLAP Mining
Introduction Mining for Blocks Multiple-Level Multidimensional Sequential Patterns Conclusion and perspectives
Multiple Level MSP
Support
Item supported by a transactionA transaction supports an item e if the value is equal to e orunder e in the hierarchy
(Block_1, February , France, c1) supports the item(EU, carpet).
sequence supported by a blockA block supports a sequence if all itemsets are supported(provided that the order is respected)
EXPEDO LIRMM-ETIS-HELP UC
OLAP Mining
Introduction Mining for Blocks Multiple-Level Multidimensional Sequential Patterns Conclusion and perspectives
Implementation
Algorithms
Generation of candidate itemsextract all the maximally specific itemslevelwise generation
Generation of candidate sequencesanti-monotonicity of the supportApriori like approach (generate - prune)Use of a prefix tree to store candidate sequences (PSP)
EXPEDO LIRMM-ETIS-HELP UC
OLAP Mining
Introduction Mining for Blocks Multiple-Level Multidimensional Sequential Patterns Conclusion and perspectives
Implementation
Experiments
Indicatorstime consumptionmemory consumptionnumber of patterns being discovered
Datasynthetic datareal data (examples from our industrial collaborations)
National French Electricity Agency (marketing)Follow-up of long term care patients to improve facilitiese-couponing (customized coupons sent to mobile phones)
EXPEDO LIRMM-ETIS-HELP UC
OLAP Mining
Introduction Mining for Blocks Multiple-Level Multidimensional Sequential Patterns Conclusion and perspectives
Outline1 Introduction
OLAP and Data MiningResearch Topics on OLAP Mining (EXPEDO)
2 Mining for BlocksFuzzy and Crisp BlocksGenerating BlocksManaging HierarchiesVisualizing Blocks
3 Multiple-Level Multidimensional Sequential PatternsMultidimensional Sequential PatternsMultiple Level MSPImplementation
4 Conclusion and perspectivesEXPEDO LIRMM-ETIS-HELP UC
OLAP Mining
Introduction Mining for Blocks Multiple-Level Multidimensional Sequential Patterns Conclusion and perspectives
Conclusion
SummaryOLAP Mining must not only plug data mining on top ofmultidimensional databases without any consideration ofthe specificitiesDiscovery of homogeneous partsHierarchy-aware Multidimensional Sequential PatternsWork has also been done on fuzzy sequential patterns
EXPEDO LIRMM-ETIS-HELP UC
OLAP Mining
Introduction Mining for Blocks Multiple-Level Multidimensional Sequential Patterns Conclusion and perspectives
Further Work
Next Challenges
CausalityOutliersSupport counting and measure valuesCondensed representationsGradual Rules (generalization of the ordered/temporaldimension),Enrol the userTest on real data
EXPEDO LIRMM-ETIS-HELP UC
OLAP Mining