pattern mining in parallel environmentspeople.irisa.fr/alexandre.termier/dmv/dmv_cm5.pdf•pattern...
TRANSCRIPT
![Page 1: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/1.jpg)
Pattern mining in parallel environments
Alexandre Termier
Université de Rennes 1 – IRISA – Equipe LACODAM
DMV – M2 SIF
![Page 2: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/2.jpg)
Naive introduction
• Pattern mining: find (interesting) patterns in data• cf Marc’s previous courses
• Need a lot of computing power• Exploration of a huge search space• Potentially costly pattern interest test
• Nowadays, computing power comes from parallelism• Multicore processors• Clusters• GPU
=> How to exploit parallel environments for pattern mining ?
![Page 3: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/3.jpg)
(Tentative) motivations
Use of computing power for pattern mining ?
• Mine large datasets• Ex: actual supermarket dataset ~ 4 TB
• Mine “troublesome” datasets• Ex: bioinformatics, SNP data: ~1000 lines / 5 000 000 columns, 25% density
• Best actual FIS algorithms surrender around 20 000 columns (yes, LCM too)
• Mine “complex” patterns• Graphs: interest = subgraph isomorphism
• Make a finer grained analysis• Usually, reduce minimum support threshold…
![Page 4: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/4.jpg)
Counter-argument
• Pattern mining outputs millions of patterns
• Few have actual value
• Why bother computing billions of patterns ?
?
![Page 5: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/5.jpg)
Motivations, take two
• Many solutions to handle pattern overabundance (more on the way)• Post-processing
• Constraint
• Pattern sets (ex: KRIMP)
• Statistics-based pattern interest functions
• …
• Use computing power to find more interesting patterns• Efficient parallel pattern space exploration
• Efficient parallel evaluation of complex pattern interest functions
• Interactive navigation in pattern space
![Page 6: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/6.jpg)
Parallel environments discussed in this talk
1. Multicore processors
2. Clusters
3. GPUs (a bit)
4. Manycores (some hints)
![Page 7: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/7.jpg)
Parallel performance 101Slides from Marc Snir – University of Illinois at Urbana Champaign
Come from IJCAI Tutorial on Parallel Data Mining
![Page 8: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/8.jpg)
Sometimes Parallelism is Easy
• Painting a fence:
• Time = (picket_painting_time) * (# pickets)/#painters
• Perfect parallelism
8Slide from Marc Snir
![Page 9: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/9.jpg)
Up To a Limit
• Task granularity cannot be too small
9
― Too many painters spoil the fence
Slide from Marc Snir
![Page 10: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/10.jpg)
Sometimes Parallelism Does Not Help
• How many babies do 9 women in one month?
10Slide from Marc Snir
![Page 11: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/11.jpg)
Some Definitions
TP = Compute time with P HW threadsT1 – sequential compute timeT∞ -- compute time with no limitations on #threads = critical path length
TP ≥ T∞; TP ≥ T1/P
• Efficient algorithm: Tp ~ T1/P T∞ << T1/P; P << T1 /T∞
• Cannot use efficiently more HW threads than the “average width” of the computation
• Example -- Amdahl Law: fraction α of the computation is sequential, (1-α) fully parallel T∞ = αT1 ; can use efficiently ~1/α processors• E.g., 10% of code is sequential -> should not use more than ~10 HW threads
11Slide from Marc Snir
![Page 12: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/12.jpg)
Speedup
• Measure of how much faster the computation executes versus the best serial code
• Serial time divided by parallel time
• Example: Painting a picket fence• 30 minutes of preparation (serial)
• One minute to paint a single picket
• 30 minutes of cleanup (serial)
• Thus, 300 pickets takes 360 minutes (serial time)
12
Speedup and Efficiency
Slide from Marc Snir
![Page 13: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/13.jpg)
Computing SpeedupNumber of painters
Time Speedup
1 30 + 300 + 30 = 360 1.0X
2 30 + 150 + 30 = 210 1.7X
10 30 + 30 + 30 = 90 4.0X
100 30 + 3 + 30 = 63 5.7X
Infinite 30 + 0 + 30 = 60 6.0X
13
• Speedup = Tp/T1
• Speedup ≤ P (P workers reduce time by at most a factor of P)
• Speedup ≤ T∞/ T1 (Amdahl’s law)
Amdahl’s Law
Potential speedup is restricted by serial portion
Speedup and Efficiency
Slide from Marc Snir
![Page 14: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/14.jpg)
Speedup
• T1/TP
• Speedup usually is sub-linear and has a plateau• how could one have superlinear speedup?
14
1
3
5
7
9
11
13
15
17
19
1 3 5 7 9 11 13 15 17 19
ideal speedup
speedup
Slide from Marc Snir
![Page 15: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/15.jpg)
Efficiency
Number of painters
Time Speedup Efficiency
1 360 1.0X 100%
2 30 + 150 + 30 = 210 1.7X 85%
10 30 + 30 + 30 = 90 4.0X 40%
100 30 + 3 + 30 = 63 5.7X 5.7%
Infinite 30 + 0 + 30 = 60 6.0X 0%
15
• Measure of how effectively computation resources (threads) are kept busy
• Speedup divided by number of HW thread
Speedup and Efficiency
Slide from Marc Snir
![Page 16: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/16.jpg)
Efficiency
• T1/(PTP)
• Efficiency is <1 and decreasing, usually
16
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1
ideal
efficiency
Slide from Marc Snir
![Page 17: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/17.jpg)
Pattern mining on multicore processors
![Page 18: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/18.jpg)
Multicore processors
• Most of todays processors are multicore processors• Moore law still active: #transitor on chip doubles every 18 monthes
• But clock frequency doesn’t increase anymore…
• …so computing power comes from multiple cores on chip
• Multicore processors have• Independent computing cores (from 2 to 12 usually)
• Shared L3 cache
• Shared or private L2 cache
• Private L1 cache
![Page 19: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/19.jpg)
Example – Intel Nehalem/Westmere
• 4-10 cores • 1/2 threads per core• vector unit per core• Three cache levels:
• Private L1(32K Data, 32K Instruction),
• Private L2 (512 KB)• Shared L3 (16-30
MB)• 32 nm technology• Can be assembled
in quad chip SMPs~50 Gflop/s peak!
Slide from Marc Snir
![Page 20: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/20.jpg)
Cache hierarchy / architecture schema
hwloc library:get architecture schema
![Page 21: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/21.jpg)
Multicore pitfalls for the pattern miner
• Synchronization• Protects accesses to shared memory areas• Sequentializes code • Avoid it: tree-shaped search space exploration (more on this later)
• Load unbalance
• Bus bandwith saturation• N cores / 1 bus to connect them to memory
• Computations much faster than memory transfers • Or too many data transfers
Thread 1
Thread 2
Thread 1 finishes -> idle !
Thread 2 finishes
![Page 22: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/22.jpg)
Case study #1: subtree mining / Tatikonda et al.
• Paper: Shirish Tatikonda, Srinivasan Parthasarathy:
Mining Tree-Structured Data on Multicore Systems. PVLDB 2(1): 694-705 (2009)
• Excellent illustration of impact of bandwith pressure on pattern mining
![Page 23: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/23.jpg)
Subtree mining 101
• Input: Tree database where transaction = tree
• Output: frequent subtree patterns
• Relies on subtree inclusion (costly) :
Induced subtree• Preserves parent-child relationships
A
B C
D B
A
B C
A
B C
D B
A
D B
Embedded subtree• Preserves ancestor-descendant relationships
• Every induced subtree is an embedded subtree
![Page 24: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/24.jpg)
Algorithm overview• Two primary steps
• Candidate subtree generation• Generate all possible candidate subtrees
• Challenge: search space traversal
• Support counting• Evaluate each candidate for their frequency
• Challenge: subtree isomorphism
• A recursive pattern-growth approach• Start with a seed pattern (a single node)
• Repeatedly grow the pattern by adding nodes (pattern extension)• This step corresponds to search space traversal
• Evaluate the frequency of the generated pattern
1. Pattern_mine(P)
2. loop
3. sup = find_frequency (P)4. if sup ≥ Ѳ
5. P’ = grow P with a new node6. Pattern_mine(P’)
A
A
A
A
AB
A
B
A
SeedPattern
A
B
How can we do it efficiently ?
Slide from Shirish Tatikonda
![Page 25: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/25.jpg)
Usual approach
• Search space exploration: • Represent trees with sequences (Asai et al., Zaki)
• Explore the search space of sequences
• Subtree inclusion test• Costly: store found embeddings -> embedding lists
• Pre-2005 tradeoff: more memory than computing power
![Page 26: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/26.jpg)
Bandwith usage of TreeMiner (Zaki et al. 2005)
• Need lots transfers of embeddinglists…
• …that have poor cache locality
• Result:• 1.2 GB/s usage per core !
• Speedup : ~2 on 8 processors…
![Page 27: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/27.jpg)
Reducing bandwidth usage
• Store embedding lists -> Recompute embedding lists on the fly• Post 2005, CPU is cheap, memory is expensive !
• Only process fixed number of embeddings at a time
TreeMiner: 1.2 GB/s TRIPS: 200 MB/s
![Page 28: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/28.jpg)
Other challenge: task partitioning
B CA
Search Space Search space is partitioned into
equivalence classes ( )
Each equivalence class contains many patterns ( )
Processing each pattern involves many trees ( )
Workload skew is present at every level― One equivalence class may contain more patterns than the other
― Processing one pattern may be more expensive than the other
― One tree may be bigger than the other
Challenge: load balancing in the presence of skewSlide from Shirish Tatikonda
![Page 29: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/29.jpg)
Key Idea:Adaptively and automatically
adjust the type and granularityof work that is shared
among cores
Adaptive Design
Thread pool
Task pool Tree pool Column pool
Context switch
Process multiple patterns in parallel
Process a single pattern i.e., multiple trees
in parallel
Process a single pattern w.r.t. a single tree in parallel
(i.e., dynamic programming matrix is processed in parallel)
Pools are empty
Job pools
Work is ready
Fine-grain Parallelism
Coarse-grainParallelism
Slide from Shirish Tatikonda
![Page 30: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/30.jpg)
Implementation of Parallel AlgorithmminingMethod ( . . . )
miningMethod ( . . . )
Job-spawning condition
Tree pool
Process_the_job
Chunk pool
Light-weight context switching
General-purpose scheduling service
Thread pool
Check the complexity Of current work
task-parallel
data-parallel
chunk-parallel
Slide from Shirish Tatikonda
![Page 31: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/31.jpg)
31
0
2
4
6
8
1 2 3 4 5 6 7 8
Cslogs
Treebank
Performance – Parallel Efficiency
0
4
8
12
16
1 2 4 6 8 10 12 14 16
Cslogs
Treebank
Cslogs w/o fine-grained
Treebank w/o fine-grained
Number of cores
Spee
du
p
Spee
du
p
Number of processors
On a dual quad core system On a SGI Altix - SMP system
1) Near-linear speedups
2) Need for fine-grain parallelism― Without which the speedups saturate
3) Memory optimizations are critical― Without them, the algorithms are not scalable (speedup of 1.7 on 8 processors)
On two data sets: Cslogs (web analytics) a nd treebank (computational linguistics)
Slide from Shirish Tatikonda
![Page 32: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/32.jpg)
Constraint pattern mining
Generic pattern miningBoley et al, 07-10Arimura & Uno, 09
32
ParaMiner[DMKD, 14]
FISApriori, 93FPGrowth, 00LCM, 04
Specific approaches
strong accessibility
strong accessibility+
decomposability
Map of pattern mining families
![Page 33: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/33.jpg)
ParaMiner
2014
![Page 34: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/34.jpg)
ParaMiner: algorithm
![Page 35: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/35.jpg)
ParaMiner’s initial scalability
![Page 36: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/36.jpg)
36
0
2
4
6
8
10
12
14
32168421
AV
ERA
GE
LATE
NC
Y(C
YCLE
S)
#CORES
AVERAGE LATENCY EVOLUTION IN PARAMINER
GRI FIS
![Page 37: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/37.jpg)
37
A B C
AB AC BC BD CD CE
Select Select Select
ABC ABD ACE BCD CDE CDF CEF
Select Select Select Select Select Select
Select Select Select Select Select Select Select
dataset dataset dataset
dataset dataset dataset dataset dataset dataset
dataset dataset dataset dataset dataset dataset dataset
![Page 38: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/38.jpg)
38
A B C
AB AC BC BD CD CE
Select Select Select
ABC ABD ACE BCD CDE CDF CEF
Select Select Select Select Select Select
Select Select Select Select Select Select Select
dataset dataset dataset
DA DA DB DB DC DC
DAB DAB DAC DBC DCD DCD DCE
dataset reduction
![Page 39: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/39.jpg)
39
A B C
AB AC BC BD CD CE
Select Select Select
ABC ABD ACE BCD CDE CDF CEF
Select Select Select Select Select Select
Select Select Select Select Select Select Select
dataset dataset dataset
DA DA
DAB DAB
dataset reduction 2.0
dataset dataset
datasetDA
DC DC
DC DC
DCE
![Page 40: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/40.jpg)
40
ProblemSpeedup (before)
max: 32Speedup (after)
max: 32
Frequent Itemset Mining(dense data)
3 21
Frequent Itemset Mining(sparse data)
11 25
Gradual Pattern Mining 27.5 28.5
Closed Relational Graph Mining
3 5
![Page 41: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/41.jpg)
Conclusion for multicores
• Ubiquitous parallel environment• Getting easier to program
• C++11, future/promises, async…• Java 8 Streaming• Scala actors…
• Main problem: cores contend for bus bandwidth
• Requires to design algorithms with a small working set
• Use the right profiling tools !• Java -> YourKit, C++/other -> Vtune, hardware counters library of Linux
41
![Page 42: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/42.jpg)
Vtune (Intel)
![Page 43: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/43.jpg)
Pattern mining on clusters
![Page 44: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/44.jpg)
Clusters
• Homogenous/heterogeneous network of (multicore) machines
• Network of clusters: grid -> backbone of cloud
• Cheapest way to get tremendous• Computing power• RAM• Storage space
• Introduces new problems• Slow communications between nodes
• Data locality
• Fault tolerance
![Page 45: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/45.jpg)
Cluster computing main environments
• MPI
• MapReduce
• Spark
![Page 46: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/46.jpg)
MPI
• Message-passing paradigm
• Programmer have total control over communications
• One abstraction level above socket programming
• Communication primitives• One-to-one• One-to-many / one-to-all• Many-to-many / all-to-all
• Messages are byte arrays
• No fault tolerance
=> Powerful but hard to use correctly
![Page 47: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/47.jpg)
MapReduce• Based on functional paradigm• Two types of operation
• Map• Reduce
• Hadoop also offers:• Distributed file system (HDFS)• Fault tolerance
• For files• For Map/Reduce jobs
• Everything commited to disk• super slow
![Page 48: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/48.jpg)
Spark
• Designed for iterative computations (including data mining)• Data stored in memory (and/or disk)
• One/two orders of magnitude faster than Map/Reduce
• Based on RDD: Resilient Distributed Dataset• Distributed collection paradigm• RDD divided in block -> each block fit into node’s RAM• RDD transformation operations
• Fault tolerance via reconstruction• Keep RDD lineage as metadata• Recompute lost RDD / RDD blocks
• Computing power is cheap !
![Page 49: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/49.jpg)
Mining top-k-per-item over clusters
• Web data: long tail
• Standard Frequent Itemset Mining + long tail:
Slid
e p
arts
Mar
tin
Kir
chge
ssn
er/
Vin
cen
t Le
roy
![Page 50: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/50.jpg)
Top-k-per-item frequent itemsets
Slid
e M
arti
n K
irch
gess
ne
r/V
ince
nt
Lero
y
![Page 51: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/51.jpg)
Mining top-k-per-item itemsets
• TopPI• Martin Kirchgessner, Vincent Leroy et al., Grenoble-Alpes University• In submission• Computes top-k closed frequent itemsets per item• Based on heavily modified LCM• Multicore and MapReduce versions
• PFP: Parallel FP-Growth• Li et al., RecSys 2008• Based on FPGrowth• The frequent itemset miner of Mahout (MapReduce)• Compute at most k itemset per item
• Sloppy output definition
![Page 52: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/52.jpg)
Example (TopPI)
![Page 53: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/53.jpg)
Reminder: closure extension (Uno et al., 03)
Each branch generates different itemsets, no need for synchronization
![Page 54: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/54.jpg)
Overview of general TopPI algorithm
![Page 55: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/55.jpg)
TopPI over MapReduce
• Distribute branches over nodes• Branch = starting item
• Each node receives a set of starting items G to process
• Distribute the top-k collector• Collector: for each item, heap of size k storing current top-k for this item
• In a distributed setting, a node can only fill collectors for items of G• May not be actual top-k of these items
• Second phase: worker get complement top-k (items not in G)
• Merge both
![Page 56: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/56.jpg)
![Page 57: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/57.jpg)
PFP
• Partition the database over the workers (shards)
• Count frequency of all individual items & organize them in groups• Same as in TopPI
• Make group-dependant transactions (conditional datasets), mine each with FP-Growth
• Aggregate discovered frequent itemsets
![Page 58: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/58.jpg)
Results – time comparison
LastFM: 1.2M lines x 1.2M columns (277 MB)Supermarket: 55M lines x 400k columns (2.8 GB)
51 x [ 2 Xeon E5520 4 core, 24 GB RAM]4 task / node
![Page 59: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/59.jpg)
Results – TopPI speedup
![Page 60: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/60.jpg)
Results – ouput comparison
![Page 61: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/61.jpg)
Arabesque
• Arabesque: A System for Distributed Graph Mining
Carlos H. C. Teixeira, Alexandre J. Fonseca, Marco Serafini, Georgos Siganos, Mohammed J. Zaki, Ashraf Aboulnaga. SOSP 2015
• Problem:• There are frameworks for analyzing very large graphs
• Pregel, Giraph
• But they are ill-adapted for frequent subgraph mining• Their base element is the vertex
• Arabesque http://arabesque.io• Embedding as base element
• Numerous problems fit in
![Page 62: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/62.jpg)
Slide Mohammed J. Zaki
![Page 63: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/63.jpg)
Slide Mohammed J. Zaki
![Page 64: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/64.jpg)
Slide Mohammed J. Zaki
![Page 65: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/65.jpg)
Slide Mohammed J. Zaki
![Page 66: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/66.jpg)
Slide Mohammed J. Zaki
![Page 67: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/67.jpg)
Slide Mohammed J. Zaki
![Page 68: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/68.jpg)
Conclusion for clusters
• Choose the right environment, stay alert for new approaches• currently Spark is the way to go for most people
• Make your computations as independent as possible• -> tree shaped search space exploration (but load unbalance risk)• -> limit number of “barriers”
• Control your data partitioning
• Use the right profiling tools !
![Page 69: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/69.jpg)
Spark UI
![Page 70: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/70.jpg)
Pattern mining on GPUs
![Page 71: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/71.jpg)
GPUs
• Different paradigm• CPU: low latency low throughput
• GPU: high latency high throughput
• Many simple cores• Simpler control logic
• Fewer cache
• Data parallelism• SIMD
![Page 72: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/72.jpg)
GPUs for pattern mining
• Pattern mining usual approaches: task parallelism• Ill-adapted for GPU
• Slow data transfers between host and GPU
• => few GPU pattern mining approaches• Apriori based on bitsets + vertical format
• FPGrowth based on array representation of FP-tree
![Page 73: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/73.jpg)
Apriori vertical + bitsets on GPU
Wenbin Fang, Mian Lu, Xiangye Xiao, Bingsheng He, Qiong Luo:
Frequent itemset mining on graphics processors. DaMoN 2009: 34-42
![Page 74: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/74.jpg)
Comparison Apriori GPU / FPGrowth CPU
![Page 75: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/75.jpg)
Conclusion on GPUs
• Pattern mining on GPUs ?• Risky business
• Most researchers who published on it changed topic…
• Novel manycore processors may be better adapted to the task
![Page 76: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/76.jpg)
Some hints on Manycores
![Page 77: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/77.jpg)
Manycores processors
• Middle ground between multicores and GPUs• ~100s of cores
• Cores • Simpler than those of multicores
• More complex than GPU cores
• Cores have• Full-fledged control logic
• Some cache
![Page 78: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/78.jpg)
Current manycores
• Intel Xeon Phi• 61 cores• 512k L2 cache per core• Extension board (same as GPU)
• Kalray MPPA• 256 cores• 16 clusters of 16 cores, 2MB per cluster
• Tilera GX• 72 cores• 256k L2 cache per core
• All: cache coherency optional
![Page 79: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/79.jpg)
Manycores for pattern mining ?
• Better control logic than GPU…• …however:
• Small caches / onboard memory (working set must be kept small)• Slow data transfers with host (as of now)
• Manycores designed for complex streaming applications:• May be adapted for online pattern mining
• Also designed for: performance per watt• See: Emilio Francesquini, Márcio Bastos Castro, Pedro H. Penna, Fabrice Dupros,
Henrique C. Freitas, Philippe Olivier Alexandre Navaux, Jean-François Méhaut:On the energy efficiency and performance of irregular application executions on multicore, NUMA and manycore platforms. J. Parallel Distrib. Comput. 76: 32-48 (2015)
![Page 80: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/80.jpg)
Conclusion
• Parallelism is getting easier and easier to get into
• -> provide necessary gain in performance
• Performance should be used to:• Show that I am faster than colleagues and get papers so 2005 !
• Extract more significant patterns
• Allow better interactivity with analysts• See KDD IDEA workshop 2013-2015
• See our CIKM 2015 paper (Omidvar Tehrani et al.)
![Page 81: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/81.jpg)
Backup slides
![Page 82: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/82.jpg)
82
![Page 83: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/83.jpg)
83
![Page 84: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/84.jpg)
84
![Page 85: Pattern mining in parallel environmentspeople.irisa.fr/Alexandre.Termier/dmv/DMV_CM5.pdf•Pattern mining: find (interesting) patterns in data •cf Marc’s previous courses •Need](https://reader030.vdocuments.mx/reader030/viewer/2022040919/5e9641cb5fb8bb174726a4b0/html5/thumbnails/85.jpg)
85