cmu scs 15-826: multimedia databases and data mining lecture #9: fractals - introduction c....
TRANSCRIPT
![Page 1: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/1.jpg)
CMU SCS
15-826: Multimedia Databases and Data Mining
Lecture #9: Fractals - introduction
C. Faloutsos
![Page 2: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/2.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 2
Must-read Material
• Christos Faloutsos and Ibrahim Kamel, Beyond Uniformity and Independence: Analysis of R-trees Using the Concept of Fractal Dimension, Proc. ACM SIGACT-SIGMOD-SIGART PODS, May 1994, pp. 4-13, Minneapolis, MN.
![Page 3: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/3.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 3
Recommended Material
optional, but very useful:
• Manfred Schroeder Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise W.H. Freeman and Company, 1991
(on reserve in the library)– Chapter 10: boxcounting method– Chapter 1: Sierpinski triangle
![Page 4: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/4.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 4
Outline
Goal: ‘Find similar / interesting things’
• Intro to DB
• Indexing - similarity search
• Data Mining
![Page 5: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/5.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 5
Indexing - Detailed outline• primary key indexing• secondary key / multi-key indexing• spatial access methods
– z-ordering– R-trees– misc
• fractals– intro– applications
• text
![Page 6: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/6.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 6
Intro to fractals - outline
• Motivation – 3 problems / case studies
• Definition of fractals and power laws
• Solutions to posed problems
• More examples and tools
• Discussion - putting fractals to work!
• Conclusions – practitioner’s guide
• Appendix: gory details - boxcounting plots
![Page 7: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/7.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 7
Road end-points of Montgomery county:
•Q1: how many d.a. for an R-tree?
•Q2 : distribution?
•not uniform
•not Gaussian
•no rules??
Problem #1: GIS - points
![Page 8: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/8.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 8
Problem #2 - spatial d.m.
Galaxies (Sloan Digital Sky Survey w/ B. Nichol)
- ‘spiral’ and ‘elliptical’ galaxies
(stores and households ...)
- patterns?
- attraction/repulsion?
- how many ‘spi’ within r from an ‘ell’?
![Page 9: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/9.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 9
Problem #3: traffic
• disk trace (from HP - J. Wilkes); Web traffic - fit a model
• how many explosions to expect?
• queue length distr.?time
# bytes
![Page 10: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/10.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 10
Problem #3: traffic
time
# bytes
Poisson indep., ident. distr
![Page 11: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/11.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 11
Problem #3: traffic
time
# bytes
Poisson indep., ident. distr
![Page 12: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/12.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 12
Problem #3: traffic
time
# bytes
Poisson indep., ident. distr
Q: Then, how to generatesuch bursty traffic?
![Page 13: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/13.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 13
Common answer:
• Fractals / self-similarities / power laws
• Seminal works from Hilbert, Minkowski, Cantor, Mandelbrot, (Hausdorff, Lyapunov, Ken Wilson, …)
![Page 14: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/14.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 14
Road map
• Motivation – 3 problems / case studies
• Definition of fractals and power laws
• Solutions to posed problems
• More examples and tools
• Discussion - putting fractals to work!
• Conclusions – practitioner’s guide
• Appendix: gory details - boxcounting plots
![Page 15: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/15.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 15
What is a fractal?
= self-similar point set, e.g., Sierpinski triangle:
...zero area;
infinite length!
Dimensionality??
![Page 16: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/16.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 16
Definitions (cont’d)
• Paradox: Infinite perimeter ; Zero area!
• ‘dimensionality’: between 1 and 2
• actually: Log(3)/Log(2) = 1.58...
![Page 17: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/17.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 17
Dfn of fd:
ONLY for a perfectly self-similar point set:
=log(n)/log(f) = log(3)/log(2) = 1.58
...zero area;
infinite length!
![Page 18: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/18.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 18
Intrinsic (‘fractal’) dimension
• Q: fractal dimension of a line?
• A: 1 (= log(2)/log(2)!)
![Page 19: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/19.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 19
Intrinsic (‘fractal’) dimension
• Q: fractal dimension of a line?
• A: 1 (= log(2)/log(2)!)
![Page 20: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/20.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 20
Intrinsic (‘fractal’) dimension
• Q: dfn for a given set of points?
42
33
24
15
yx
![Page 21: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/21.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 21
Intrinsic (‘fractal’) dimension
• Q: fractal dimension of a line?
• A: nn ( <= r ) ~ r^1(‘power law’: y=x^a)
• Q: fd of a plane?• A: nn ( <= r ) ~ r^2fd== slope of (log(nn) vs
log(r) )
![Page 22: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/22.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 22
Intrinsic (‘fractal’) dimension
• Algorithm, to estimate it?Notice• avg nn(<=r) is exactly tot#pairs(<=r) / N
including ‘mirror’ pairs
![Page 23: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/23.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 23
Sierpinsky triangle
log( r )
log(#pairs within <=r )
1.58
== ‘correlation integral’
![Page 24: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/24.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 24
Observations:
• Euclidean objects have integer fractal dimensions – point: 0– lines and smooth curves: 1– smooth surfaces: 2
• fractal dimension -> roughness of the periphery
![Page 25: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/25.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 25
Important properties
• fd = embedding dimension -> uniform pointset
• a point set may have several fd, depending on scale
![Page 26: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/26.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 26
Important properties
• fd = embedding dimension -> uniform pointset
• a point set may have several fd, depending on scale
2-d
![Page 27: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/27.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 27
Important properties
• fd = embedding dimension -> uniform pointset
• a point set may have several fd, depending on scale
1-d
![Page 28: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/28.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 28
Important properties
0-d
![Page 29: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/29.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 29
Road map
• Motivation – 3 problems / case studies
• Definition of fractals and power laws
• Solutions to posed problems
• More examples and tools
• Discussion - putting fractals to work!
• Conclusions – practitioner’s guide
• Appendix: gory details - boxcounting plots
![Page 30: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/30.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 30
Cross-roads of Montgomery county:
•any rules?
Problem #1: GIS points
![Page 31: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/31.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 31
Solution #1
A: self-similarity ->• <=> fractals • <=> scale-free• <=> power-laws
(y=x^a, F=C*r^(-2))• avg#neighbors(<= r )
= r^D
log( r )
log(#pairs(within <= r))
1.51
![Page 32: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/32.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 32
Solution #1
A: self-similarity• avg#neighbors(<= r )
~ r^(1.51)
log( r )
log(#pairs(within <= r))
1.51
![Page 33: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/33.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 33
Examples:MG county
• Montgomery County of MD (road end-points)
![Page 34: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/34.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 34
Examples:LB county
• Long Beach county of CA (road end-points)
![Page 35: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/35.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 35
Solution#2: spatial d.m.Galaxies ( ‘BOPS’ plot - [sigmod2000])
log(#pairs)
log(r)
![Page 36: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/36.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 36
Solution#2: spatial d.m.
log(r)
log(#pairs within <=r )
spi-spi
spi-ell
ell-ell
- 1.8 slope
- plateau!
- repulsion!
![Page 37: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/37.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 37
Spatial d.m.
log(r)
log(#pairs within <=r )
spi-spi
spi-ell
ell-ell
- 1.8 slope
- plateau!
- repulsion!
![Page 38: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/38.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 38
Spatial d.m.
r1r2
r1
r2
Heuristic on choosing # of clusters
![Page 39: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/39.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 39
Spatial d.m.
log(r)
log(#pairs within <=r )
spi-spi
spi-ell
ell-ell
- 1.8 slope
- plateau!
- repulsion!
![Page 40: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/40.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 40
Spatial d.m.
log(r)
log(#pairs within <=r )
spi-spi
spi-ell
ell-ell
- 1.8 slope
- plateau!
-repulsion!!
-duplicates
![Page 41: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/41.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 41
Solution #3: traffic
• disk traces: self-similar:
time
#bytes
![Page 42: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/42.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 42
Solution #3: traffic
• disk traces (80-20 ‘law’ = ‘multifractal’)
time
#bytes
20% 80%
![Page 43: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/43.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 43
80-20 / multifractals20 80
![Page 44: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/44.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 44
80-20 / multifractals20
• p ; (1-p) in general
• yes, there are dependencies
80
![Page 45: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/45.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 45
More on 80/20: PQRS
• Part of ‘self-* storage’ project [Wang+’02]
time
cylinder#
![Page 46: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/46.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 46
More on 80/20: PQRS
• Part of ‘self-* storage’ project [Wang+’02]
p q
r s
q
r s
![Page 47: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/47.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 47
Solution#3: traffic
Clarification:
• fractal: a set of points that is self-similar
• multifractal: a probability density function that is self-similar
Many other time-sequences are bursty/clustered: (such as?)
![Page 48: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/48.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 48
Example:
• network traffic
http://repository.cs.vt.edu/lbl-conn-7.tar.Z
![Page 49: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/49.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 49
Web traffic
• [Crovella Bestavros, SIGMETRICS’96]
1000 sec; 100sec10sec; 1sec
![Page 50: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/50.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 50
Tape accesses
time
Tape#1 Tape# N
# tapes needed, to retrieve n records?
(# days down, due to failures / hurricanes / communication noise...)
![Page 51: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/51.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 51
Tape accesses
time
Tape#1 Tape# N
# tapes retrieved
# qual. records
50-50 = Poisson
real
![Page 52: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/52.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 52
Road map
• Motivation – 3 problems / case studies
• Definition of fractals and power laws
• Solutions to posed problems
• More tools and examples
• Discussion - putting fractals to work!
• Conclusions – practitioner’s guide
• Appendix: gory details - boxcounting plots
![Page 53: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/53.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 53
A counter-intuitive example
• avg degree is, say 3.3• pick a node at random
– guess its degree, exactly (-> “mode”)
degree
count
avg: 3.3
?
![Page 54: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/54.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 54
A counter-intuitive example
• avg degree is, say 3.3• pick a node at random
– guess its degree, exactly (-> “mode”)
• A: 1!!
degree
count
avg: 3.3
![Page 55: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/55.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 55
A counter-intuitive example
• avg degree is, say 3.3• pick a node at random
- what is the degree you expect it to have?
• A: 1!!• A’: very skewed distr.• Corollary: the mean is
meaningless!• (and std -> infinity (!))
degree
count
avg: 3.3
![Page 56: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/56.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 56
Rank exponent R• Power law in the degree distribution
[SIGCOMM99]
internet domains
log(rank)
log(degree)
-0.82
att.com
ibm.com
![Page 57: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/57.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 57
More tools
• Zipf’s law
• Korcak’s law / “fat fractals”
![Page 58: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/58.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 58
A famous power law: Zipf’s law
• Q: vocabulary word frequency in a document - any pattern?
aaron zoo
freq.
![Page 59: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/59.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 59
A famous power law: Zipf’s law
• Bible - rank vs frequency (log-log)
log(rank)
log(freq)
“a”
“the”
![Page 60: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/60.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 60
A famous power law: Zipf’s law
• Bible - rank vs frequency (log-log)
• similarly, in many other languages; for customers and sales volume; city populations etc etclog(rank)
log(freq)
![Page 61: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/61.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 61
A famous power law: Zipf’s law
•Zipf distr:
freq = 1/ rank
•generalized Zipf:
freq = 1 / (rank)^a
log(rank)
log(freq)
![Page 62: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/62.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 62
Olympic medals (Sidney):
y = -0.9676x + 2.3054
R2 = 0.9458
0
0.5
1
1.5
2
2.5
0 0.5 1 1.5 2
Series1
Linear (Series1)
rank
log(#medals)
![Page 63: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/63.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 63
Olympic medals (Sidney’00, Athens’04):
log( rank)
log(#medals)
0
0.5
1
1.5
2
2.5
0 0.5 1 1.5 2
athens
sidney
![Page 64: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/64.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 64
TELCO data
# of service units
count ofcustomers
‘best customer’
![Page 65: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/65.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 65
SALES data – store#96
# units sold
count of products
“aspirin”
![Page 66: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/66.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 66
More power laws: areas – Korcak’s law
Scandinavian lakes
Any pattern?
![Page 67: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/67.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 67
More power laws: areas – Korcak’s law
Scandinavian lakes area vs complementary cumulative count (log-log axes)
log(count( >= area))
log(area)
![Page 68: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/68.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 68
More power laws: Korcak
Japan islands;
area vs cumulative count (log-log axes) log(area)
log(count( >= area))
![Page 69: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/69.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 69
(Korcak’s law: Aegean islands)
![Page 70: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/70.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 70
Korcak’s law & “fat fractals”
How to generate such regions?
![Page 71: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/71.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 71
Korcak’s law & “fat fractals”Q: How to generate such regions?A: recursively, from a single region
...
![Page 72: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/72.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 72
so far we’ve seen:
• concepts:– fractals, multifractals and fat fractals
• tools:– correlation integral (= pair-count plot)– rank/frequency plot (Zipf’s law)– CCDF (Korcak’s law)
![Page 73: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/73.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 73
so far we’ve seen:
• concepts:– fractals, multifractals and fat fractals
• tools:– correlation integral (= pair-count plot)– rank/frequency plot (Zipf’s law)– CCDF (Korcak’s law)
sameinfo
![Page 74: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/74.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 74
Road map
• Motivation – 3 problems / case studies
• Definition of fractals and power laws
• Solutions to posed problems
• More tools and examples
• Discussion - putting fractals to work!
• Conclusions – practitioner’s guide
• Appendix: gory details - boxcounting plots
![Page 75: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/75.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 82
Fractals & power laws:
appear in numerous settings:
• medical
• geographical / geological
• social
• computer-system related
![Page 76: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/76.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 83
More apps: Brain scans
• Oct-trees; brain-scans
octree levels
Log(#octants)
![Page 77: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/77.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 84
More apps: Brain scans
• Oct-trees; brain-scans
octree levels
Log(#octants)
2.63 = fd
![Page 78: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/78.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 85
More apps: Medical images
[Burdett et al, SPIE ‘93]:
• benign tumors: fd ~ 2.37
• malignant: fd ~ 2.56
![Page 79: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/79.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 86
More fractals:
• cardiovascular system: 3 (!)
• lungs: 2.9
![Page 80: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/80.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 87
Fractals & power laws:
appear in numerous settings:
• medical
• geographical / geological
• social
• computer-system related
![Page 81: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/81.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 88
More fractals:
• Coastlines: 1.2-1.58
1 1.1
1.3
![Page 82: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/82.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 89
![Page 83: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/83.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 90
More fractals:
• the fractal dimension for the Amazon river is 1.85 (Nile: 1.4)
[ems.gphys.unc.edu/nonlinear/fractals/examples.html]
![Page 84: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/84.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 91
More fractals:
• the fractal dimension for the Amazon river is 1.85 (Nile: 1.4)
[ems.gphys.unc.edu/nonlinear/fractals/examples.html]
![Page 85: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/85.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 92
More power laws
• Energy of earthquakes (Gutenberg-Richter law) [simscience.org]
log(freq)
magnitudeday
amplitude
![Page 86: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/86.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 93
Fractals & power laws:
appear in numerous settings:
• medical
• geographical / geological
• social
• computer-system related
![Page 87: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/87.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 94
More fractals:
stock prices (LYCOS) - random walks: 1.5
1 year 2 years
![Page 88: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/88.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 95
Even more power laws:
• Income distribution (Pareto’s law)• size of firms
• publication counts (Lotka’s law)
![Page 89: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/89.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 97
Even more power laws:
• web hit counts [w/ A. Montgomery]
Web Site Traffic
log(freq)
log(count)
Zipf“yahoo.com”
![Page 90: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/90.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 98
Fractals & power laws:
appear in numerous settings:
• medical
• geographical / geological
• social
• computer-system related
![Page 91: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/91.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 99
Power laws, cont’d
• In- and out-degree distribution of web sites [Barabasi], [IBM-CLEVER]
log indegree
- log(freq)
from [Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins ]
![Page 92: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/92.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 100
Power laws, cont’d
• In- and out-degree distribution of web sites [Barabasi], [IBM-CLEVER]
log indegree
log(freq)
from [Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins ]
![Page 93: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/93.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 101
“Foiled by power law”
• [Broder+, WWW’00]
(log) in-degree
(log) count
![Page 94: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/94.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 102
“Foiled by power law”
• [Broder+, WWW’00]
“The anomalous bump at 120on the x-axis is due a large clique formed by a single spammer”
(log) in-degree
(log) count
![Page 95: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/95.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 103
Power laws, cont’d
• In- and out-degree distribution of web sites [Barabasi], [IBM-CLEVER]
• length of file transfers [Crovella+Bestavros ‘96]
• duration of UNIX jobs [Harchol-Balter]
![Page 96: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/96.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 104
Even more power laws:
• Distribution of UNIX file sizes
• web hit counts [Huberman]
![Page 97: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/97.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 105
Road map
• Motivation – 3 problems / case studies
• Definition of fractals and power laws
• Solutions to posed problems
• More examples and tools
• Discussion - putting fractals to work!
• Conclusions – practitioner’s guide
• Appendix: gory details - boxcounting plots
![Page 98: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/98.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 106
What else can they solve?
• separability [KDD’02]• forecasting [CIKM’02]• dimensionality reduction [SBBD’00]• non-linear axis scaling [KDD’02]• disk trace modeling [Wang+’02]• selectivity of spatial/multimedia queries
[PODS’94, VLDB’95, ICDE’00]• ...
![Page 99: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/99.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 107
Settings for fractals:
Points; areas (-> fat fractals), eg:
![Page 100: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/100.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 108
Settings for fractals:
Points; areas, eg:
• cities/stores/hospitals, over earth’s surface
• time-stamps of events (customer arrivals, packet losses, criminal actions) over time
• regions (sales areas, islands, patches of habitats) over space
![Page 101: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/101.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 109
Settings for fractals:
• customer feature vectors (age, income, frequency of visits, amount of sales per visit)
‘good’ customers
‘bad’ customers
![Page 102: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/102.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 110
Some uses of fractals:
• Detect non-existence of rules (if points are uniform)
• Detect non-homogeneous regions (eg., legal login time-stamps may have different fd than intruders’)
• Estimate number of neighbors / customers / competitors within a radius
![Page 103: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/103.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 111
Multi-Fractals
Setting: points or objects, w/ some value, eg:– cities w/ populations– positions on earth and amount of gold/water/oil
underneath– product ids and sales per product – people and their salaries– months and count of accidents
![Page 104: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/104.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 112
Use of multifractals:
• Estimate tape/disk accesses– how many of the 100 tapes contain my 50
phonecall records?– how many days without an accident?
time
Tape#1 Tape# N
![Page 105: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/105.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 113
Use of multifractals
• how often do we exceed the threshold?
time
#bytes
Poisson
![Page 106: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/106.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 114
Use of multifractals cont’d
• Extrapolations for/from samples
time
#bytes
![Page 107: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/107.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 115
Use of multifractals cont’d
• How many distinct products account for 90% of the sales?
20% 80%
![Page 108: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/108.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 116
Road map
• Motivation – 3 problems / case studies
• Definition of fractals and power laws
• Solutions to posed problems
• More examples and tools
• Discussion - putting fractals to work!
• Conclusions – practitioner’s guide
• Appendix: gory details - boxcounting plots
![Page 109: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/109.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 117
Conclusions
• Real data often disobey textbook assumptions (Gaussian, Poisson, uniformity, independence)
![Page 110: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/110.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 118
Conclusions - cont’d
Self-similarity & power laws: appear in many cases
Bad news:
lead to skewed distributions
(no Gaussian, Poisson,
uniformity, independence,
mean, variance)
![Page 111: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/111.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 119
Conclusions - cont’d
Self-similarity & power laws: appear in many cases
Bad news:
lead to skewed distributions
(no Gaussian, Poisson,
uniformity, independence,
mean, variance)X
![Page 112: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/112.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 120
Conclusions
• tool#1: (for points) ‘correlation integral’: (#pairs within <= r) vs (distance r)
• tool#2: (for categorical values) rank-frequency plot (a’la Zipf)
• tool#3: (for numerical values) CCDF: Complementary cumulative distr. function (#of elements with value >= a )
![Page 113: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/113.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 121
Practitioner’s guide:• tool#1: #pairs vs distance, for a set of objects, with a
distance function (slope = intrinsic dimensionality)
log(hops)
log(#pairs)
2.8
log( r )
log(#pairs(within <= r))
1.51
internetMGcounty
![Page 114: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/114.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 122
Practitioner’s guide:• tool#2: rank-frequency plot (for categorical attributes)
log(rank)
log(degree)
-0.82
internet domains Biblelog(freq)
log(rank)
![Page 115: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/115.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 123
Practitioner’s guide:• tool#3: CCDF, for (skewed) numerical attributes, eg.
areas of islands/lakes, UNIX jobs...)
log(count( >= area))
log(area)
scandinavian lakes
![Page 116: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/116.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 124
Resources:
• Software for fractal dimension– www.cs.cmu.edu/~christos/software.html– And specifically ‘fdnq_h’:– www.cs.cmu.edu/~christos/SRC/fdnq_h.zip
• Also, in ‘R’: ‘fdim’ package
![Page 117: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/117.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 125
Books
• Strongly recommended intro book:– Manfred Schroeder Fractals, Chaos, Power
Laws: Minutes from an Infinite Paradise W.H. Freeman and Company, 1991
• Classic book on fractals:– B. Mandelbrot Fractal Geometry of Nature,
W.H. Freeman, 1977
![Page 118: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/118.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 126
References
• [vldb95] Alberto Belussi and Christos Faloutsos, Estimating the Selectivity of Spatial Queries Using the `Correlation' Fractal Dimension Proc. of VLDB, p. 299-310, 1995
• [Broder+’00] Andrei Broder, Ravi Kumar , Farzin Maghoul1, Prabhakar Raghavan , Sridhar Rajagopalan , Raymie Stata, Andrew Tomkins , Janet Wiener, Graph structure in the web , WWW’00
• M. Crovella and A. Bestavros, Self similarity in World wide web traffic: Evidence and possible causes , SIGMETRICS ’96.
![Page 119: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/119.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 127
References
– [ieeeTN94] W. E. Leland, M.S. Taqqu, W. Willinger, D.V. Wilson, On the Self-Similar Nature of Ethernet Traffic, IEEE Transactions on Networking, 2, 1, pp 1-15, Feb. 1994.
– [pods94] Christos Faloutsos and Ibrahim Kamel, Beyond Uniformity and Independence: Analysis of R-trees Using the Concept of Fractal Dimension, PODS, Minneapolis, MN, May 24-26, 1994, pp. 4-13
![Page 120: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/120.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 128
References
– [vldb96] Christos Faloutsos, Yossi Matias and Avi Silberschatz, Modeling Skewed Distributions Using Multifractals and the `80-20 Law’ Conf. on Very Large Data Bases (VLDB), Bombay, India, Sept. 1996.
![Page 121: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/121.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 129
References
– [vldb96] Christos Faloutsos and Volker Gaede Analysis of the Z-Ordering Method Using the Hausdorff Fractal Dimension VLD, Bombay, India, Sept. 1996
– [sigcomm99] Michalis Faloutsos, Petros Faloutsos and Christos Faloutsos, What does the Internet look like? Empirical Laws of the Internet Topology, SIGCOMM 1999
![Page 122: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/122.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 130
References
– [icde99] Guido Proietti and Christos Faloutsos, I/O complexity for range queries on region data stored using an R-tree International Conference on Data Engineering (ICDE), Sydney, Australia, March 23-26, 1999
– [sigmod2000] Christos Faloutsos, Bernhard Seeger, Agma J. M. Traina and Caetano Traina Jr., Spatial Join Selectivity Using Power Laws, SIGMOD 2000
![Page 123: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/123.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 131
References
- [Wang+’02] Mengzhi Wang, Anastassia Ailamaki and Christos Faloutsos, Capturing the spatio-temporal behavior of real traffic data Performance 2002 (IFIP Int. Symp. on Computer Performance Modeling, Measurement and Evaluation), Rome, Italy, Sept. 2002
![Page 124: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/124.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 132
Appendix - Gory details
• Bad news: There are more than one fractal dimensions– Minkowski fd; Hausdorff fd; Correlation fd;
Information fd
• Great news: – they can all be computed fast!– they usually have nearby values
![Page 125: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/125.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 133
Fast estimation of fd(s):
• How, for the (correlation) fractal dimension?
• A: Box-counting plot:
log( r )
rpi
log(sum(pi ^2))
![Page 126: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/126.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 134
Definitions
• pi : the percentage (or count) of points in the i-th cell
• r: the side of the grid
![Page 127: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/127.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 135
Fast estimation of fd(s):
• compute sum(pi^2) for another grid side, r’
log( r )
r’
pi’
log(sum(pi ^2))
![Page 128: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/128.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 136
Fast estimation of fd(s):• etc; if the resulting plot has a linear part, its
slope is the correlation fractal dimension D2
log( r )
log(sum(pi ^2))
![Page 129: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/129.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 137
Definitions (cont’d)
• Many more fractal dimensions Dq (related to Renyi entropies):
)log(
)log(
1)log(
)log(
1
1
1 r
ppD
qr
p
qD
ii
q
iq
![Page 130: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/130.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 138
Hausdorff or box-counting fd:
• Box counting plot: Log( N ( r ) ) vs Log ( r)
• r: grid side
• N (r ): count of non-empty cells
• (Hausdorff) fractal dimension D0:
)log(
))(log(0 r
rND
![Page 131: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/131.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 139
Definitions (cont’d)
• Hausdorff fd:r
log(r)
log(#non-empty cells)
D0
![Page 132: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/132.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 140
Observations
• q=0: Hausdorff fractal dimension
• q=2: Correlation fractal dimension (identical to the exponent of the number of neighbors vs radius)
• q=1: Information fractal dimension
![Page 133: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/133.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 141
Observations, cont’d
• in general, the Dq’s take similar, but not identical, values.
• except for perfectly self-similar point-sets, where Dq=Dq’ for any q, q’
![Page 134: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/134.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 142
Examples:MG county
• Montgomery County of MD (road end-points)
![Page 135: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/135.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 143
Examples:LB county
• Long Beach county of CA (road end-points)
![Page 136: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ccc5503460f94995fca/html5/thumbnails/136.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) 144
Conclusions
• many fractal dimensions, with nearby values• can be computed quickly
(O(N) or O(N log(N))
• (code: on the web:–
www.cs.cmu.edu/~christos/SRC/fdnq_h.zip
– Or `R’ (‘fdim’ package)