indexing and data mining in multimedia databases
DESCRIPTION
TRANSCRIPT
![Page 1: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/1.jpg)
Indexing and Data Mining in Multimedia Databases
Christos Faloutsos
CMU www.cs.cmu.edu/~christos
![Page 2: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/2.jpg)
USC 2001 C. Faloutsos 2
Outline
Goal: ‘Find similar / interesting things’
• Problem - Applications
• Indexing - similarity search
• New tools for Data Mining: Fractals
• Conclusions
• Resources
![Page 3: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/3.jpg)
USC 2001 C. Faloutsos 3
Problem
Given a large collection of (multimedia) records, find similar/interesting things, ie:
• Allow fast, approximate queries, and
• Find rules/patterns
![Page 4: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/4.jpg)
USC 2001 C. Faloutsos 4
Sample queries
• Similarity search– Find pairs of branches with similar sales
patterns– find medical cases similar to Smith's– Find pairs of sensor series that move in sync– Find shapes like a spark-plug
![Page 5: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/5.jpg)
USC 2001 C. Faloutsos 5
Sample queries –cont’d
• Rule discovery– Clusters (of branches; of sensor data; ...)– Forecasting (total sales for next year?)– Outliers (eg., unexpected part failures; fraud
detection)
![Page 6: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/6.jpg)
USC 2001 C. Faloutsos 6
Outline
Goal: ‘Find similar / interesting things’
• Problem - Applications
• Indexing - similarity search
• New tools for Data Mining: Fractals
• Conclusions
• related projects @ CMU and resourses
![Page 7: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/7.jpg)
USC 2001 C. Faloutsos 7
Indexing - Multimedia
Problem:
• given a set of (multimedia) objects,
• find the ones similar to a desirable query object
![Page 8: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/8.jpg)
USC 2001 C. Faloutsos 8
day
$price
1 365
day
$price
1 365
day
$price
1 365
distance function: by expert
![Page 9: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/9.jpg)
USC 2001 C. Faloutsos 9
day1 365
day1 365
S1
Sn
F(S1)
F(Sn)
‘GEMINI’ - Pictorially
eg, avg
eg,. std
![Page 10: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/10.jpg)
USC 2001 C. Faloutsos 10
Remaining issues
• how to extract features automatically?
• how to merge similarity scores from different media
![Page 11: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/11.jpg)
USC 2001 C. Faloutsos 11
Outline
Goal: ‘Find similar / interesting things’
• Problem - Applications
• Indexing - similarity search– Visualization: Fastmap– Relevance feedback: FALCON
• Data Mining / Fractals
• Conclusions
![Page 12: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/12.jpg)
USC 2001 C. Faloutsos 12
FastMap
O1 O2 O3 O4 O5
O1 0 1 1 100 100
O2 1 0 1 100 100
O3 1 1 0 100 100
O4 100 100 100 0 1
O5 100 100 100 1 0
~100
~1
??
![Page 13: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/13.jpg)
USC 2001 C. Faloutsos 13
FastMap
• Multi-dimensional scaling (MDS) can do that, but in O(N**2) time
• We want a linear algorithm: FastMap [SIGMOD95]
![Page 14: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/14.jpg)
USC 2001 C. Faloutsos 14
Applications: time sequences
• given n co-evolving time sequences
• visualize them + find rules [ICDE00]
time
rate
HKD
JPY
DEM
![Page 15: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/15.jpg)
USC 2001 C. Faloutsos 15
Applications - financial• currency exchange rates [ICDE00]
USD(t)
USD(t-5)
FRFGBPJPYHKD
![Page 16: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/16.jpg)
USC 2001 C. Faloutsos 16
Applications - financial• currency exchange rates [ICDE00]
USD
HKD
JPY
FRFDEM
GBP
USD(t)
USD(t-5)
![Page 17: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/17.jpg)
USC 2001 C. Faloutsos 17
Application: VideoTrails
[ACM MM97]
![Page 18: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/18.jpg)
USC 2001 C. Faloutsos 18
VideoTrails - usage
• scene-cut detection (about 10% errors)
• scene classification (eg., dialogue vs action)
![Page 19: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/19.jpg)
USC 2001 C. Faloutsos 19
Outline
Goal: ‘Find similar / interesting things’
• Problem - Applications
• Indexing - similarity search– Visualization: Fastmap– Relevance feedback: FALCON
• Data Mining / Fractals
• Conclusions
![Page 20: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/20.jpg)
USC 2001 C. Faloutsos 20
Merging similarity scores
• eg., video: text, color, motion, audio– weights change with the query!
• solution 1: user specifies weights
• solution 2: user gives examples – and we ‘learn’ what he/she wants: rel. feedback
(Rocchio, MARS, MindReader)– but: how about disjunctive queries?
![Page 21: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/21.jpg)
USC 2001 C. Faloutsos 21
‘FALCON’Inverted VsVs
Trader wants only ‘unstable’ stocks
![Page 22: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/22.jpg)
USC 2001 C. Faloutsos 22
“Single query point” methods
Rocchio
+
+ ++
++
x
![Page 23: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/23.jpg)
USC 2001 C. Faloutsos 23
“Single query point” methods
Rocchio MindReader
+
+ ++
++ +
+ ++
++ +
+ ++
++
MARS
The averaging affect in action...
x x x
![Page 24: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/24.jpg)
USC 2001 C. Faloutsos 24
++
+
++
Main idea: FALCON Contours
feature1 (eg., temperature)
feature2
eg., frequency
[Wu+, vldb2000]
![Page 25: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/25.jpg)
USC 2001 C. Faloutsos 25
Conclusions for indexing + visualization
• GEMINI: fast indexing, exploiting off-the-shelf SAMs
• FastMap: automatic feature extraction in O(N) time
• FALCON: relevance feedback for disjunctive queries
![Page 26: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/26.jpg)
USC 2001 C. Faloutsos 26
Outline
Goal: ‘Find similar / interesting things’
• Problem - Applications
• Indexing - similarity search
• New tools for Data Mining: Fractals
• Conclusions
• Resourses
![Page 27: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/27.jpg)
USC 2001 C. Faloutsos 27
Data mining & fractals – Road map
• Motivation – problems / case study
• Definition of fractals and power laws
• Solutions to posed problems
• More examples
![Page 28: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/28.jpg)
USC 2001 C. Faloutsos 28
Problem #1 - spatial d.m.
Galaxies (Sloan Digital Sky Survey w/ B. Nichol) - ‘spiral’ and ‘elliptical’
galaxies
(stores & households ; mpg & MTBF...)
- patterns? (not Gaussian; not uniform)
-attraction/repulsion?
- separability??
![Page 29: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/29.jpg)
USC 2001 C. Faloutsos 29
Problem#2: dim. reduction
• given attributes x1, ... xn
– possibly, non-linearly correlated
• drop the useless ones
(Q: why?
A: to avoid the ‘dimensionality curse’)
![Page 30: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/30.jpg)
USC 2001 C. Faloutsos 30
Answer:
• Fractals / self-similarities / power laws
![Page 31: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/31.jpg)
USC 2001 C. Faloutsos 31
What is a fractal?
= self-similar point set, e.g., Sierpinski triangle:
...zero area;
infinite length!
![Page 32: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/32.jpg)
USC 2001 C. Faloutsos 32
Definitions (cont’d)
• Paradox: Infinite perimeter ; Zero area!
• ‘dimensionality’: between 1 and 2
• actually: Log(3)/Log(2) = 1.58… (long story)
![Page 33: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/33.jpg)
USC 2001 C. Faloutsos 33
Intrinsic (‘fractal’) dimension
• Q: fractal dimension of a line?
x y
5 1
4 2
3 3
2 4
Eg:
#cylinders; miles / gallon
![Page 34: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/34.jpg)
USC 2001 C. Faloutsos 34
Intrinsic (‘fractal’) dimension
• Q: fractal dimension of a line?
• A: nn ( <= r ) ~ r^1(‘power law’: y=x^a)
![Page 35: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/35.jpg)
USC 2001 C. Faloutsos 35
Intrinsic (‘fractal’) dimension
• Q: fractal dimension of a line?
• A: nn ( <= r ) ~ r^1(‘power law’: y=x^a)
• Q: fd of a plane?• A: nn ( <= r ) ~ r^2fd== slope of (log(nn) vs
log(r) )
![Page 36: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/36.jpg)
USC 2001 C. Faloutsos 36
Sierpinsky triangle
log( r )
log(#pairs within <=r )
1.58
== ‘correlation integral’
![Page 37: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/37.jpg)
USC 2001 C. Faloutsos 37
Road map
• Motivation – problems / case studies
• Definition of fractals and power laws
• Solutions to posed problems
• More examples
• Conclusions
![Page 38: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/38.jpg)
USC 2001 C. Faloutsos 38
Solution#1: spatial d.m.Galaxies (Sloan Digital Sky Survey w/ B.
Nichol - ‘BOPS’ plot - [sigmod2000])
•clusters?
•separable?
•attraction/repulsion?
•data ‘scrubbing’ – duplicates?
![Page 39: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/39.jpg)
USC 2001 C. Faloutsos 39
Solution#1: spatial d.m.
log(r)
log(#pairs within <=r )
spi-spi
spi-ell
ell-ell
- 1.8 slope
- plateau!
- repulsion!
![Page 40: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/40.jpg)
USC 2001 C. Faloutsos 40
Solution#1: spatial d.m.
log(r)
log(#pairs within <=r )
spi-spi
spi-ell
ell-ell
- 1.8 slope
- plateau!
- repulsion!
[w/ Seeger, Traina, Traina, SIGMOD00]
![Page 41: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/41.jpg)
USC 2001 C. Faloutsos 41
spatial d.m.
r1r2
r1
r2
Heuristic on choosing # of clusters
![Page 42: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/42.jpg)
USC 2001 C. Faloutsos 42
Solution#1: spatial d.m.
log(r)
log(#pairs within <=r )
spi-spi
spi-ell
ell-ell
- 1.8 slope
- plateau!
- repulsion!
![Page 43: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/43.jpg)
USC 2001 C. Faloutsos 43
Solution#1: spatial d.m.
log(r)
log(#pairs within <=r )
spi-spi
spi-ell
ell-ell
- 1.8 slope
- plateau!
-repulsion!!
-duplicates
![Page 44: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/44.jpg)
USC 2001 C. Faloutsos 44
Problem #2: Dim. reduction
y
xxx
yy(a) Quarter-circle (c) Spike(b)Line1
0000
1
![Page 45: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/45.jpg)
USC 2001 C. Faloutsos 45
Solution:
• drop the attributes that don’t increase the ‘partial f.d.’ PFD
• dfn: PFD of attribute set A is the f.d. of the projected cloud of points [w/ Traina, Traina, Wu, SBBD00]
![Page 46: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/46.jpg)
USC 2001 C. Faloutsos 46
Problem #2: dim. reduction
y
xxx
yy(a) Quarter-circle (c) Spike(b)Line1
0000
1
PFD~1
PFD~1global FD=1 PFD=1
PFD=0PFD=1
![Page 47: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/47.jpg)
USC 2001 C. Faloutsos 47
Problem #2: dim. reduction
y
xxx
yy(a) Quarter-circle (c) Spike(b)Line1
0000
1
PFD~1
PFD=1global FD=1PFD=1
PFD=0PFD=1
Notice: ‘max variance’ would fail here
![Page 48: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/48.jpg)
USC 2001 C. Faloutsos 48
Problem #2: dim. reduction
y
xxx
yy(a) Quarter-circle (c) Spike(b)Line1
0000
1
PFD~1
PFD~1global FD=1
PFD=1
PFD=0PFD=1
Notice: SVD would fail here
![Page 49: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/49.jpg)
USC 2001 C. Faloutsos 49
Road map
• Motivation – problems / case studies
• Definition of fractals and power laws
• Solutions to posed problems
• More examples– fractals– power laws
• Conclusions
![Page 50: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/50.jpg)
USC 2001 C. Faloutsos 50
disk traffic
• Not Poisson, not(?) iid - BUT: self-similar• How to model it?
time
#bytes
![Page 51: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/51.jpg)
USC 2001 C. Faloutsos 51
traffic
• disk traces (80-20 ‘law’ = ‘multifractal’ [ICDE’02])
time
#bytes
20% 80%
![Page 52: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/52.jpg)
USC 2001 C. Faloutsos 52
Traffic
Many other time-sequences are bursty/clustered: (such as?)
![Page 53: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/53.jpg)
USC 2001 C. Faloutsos 53
Tape accesses
time
Tape#1 Tape# N
# tapes needed, to retrieve n records?
(# days down, due to failures / hurricanes / communication noise...)
![Page 54: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/54.jpg)
USC 2001 C. Faloutsos 54
Tape accesses
time
Tape#1 Tape# N
# tapes retrieved
# qual. records
50-50 = Poisson
real
![Page 55: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/55.jpg)
USC 2001 C. Faloutsos 55
More apps: Brain scans
• Oct-trees; brain-scans
octree levels
Log(#octants)
2.63 = fd
![Page 56: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/56.jpg)
USC 2001 C. Faloutsos 56
Cross-roads of Montgomery county:
•any rules?
GIS points
![Page 57: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/57.jpg)
USC 2001 C. Faloutsos 57
GIS
A: self-similarity:• intrinsic dim. = 1.51• avg#neighbors(<= r )
= r^D
log( r )
log(#pairs(within <= r))
1.51
![Page 58: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/58.jpg)
USC 2001 C. Faloutsos 58
Examples:LB county
• Long Beach county of CA (road end-points)
![Page 59: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/59.jpg)
USC 2001 C. Faloutsos 59
More fractals:
• cardiovascular system: 3 (!)
• stock prices (LYCOS) - random walks: 1.5
• Coastlines: 1.2-1.58 (?)
1 year 2 years
![Page 60: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/60.jpg)
USC 2001 C. Faloutsos 60
![Page 61: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/61.jpg)
USC 2001 C. Faloutsos 61
Road map
• Motivation – problems / case studies
• Definition of fractals and power laws
• Solutions to posed problems
• More examples – fractals– power laws
• Conclusions
![Page 62: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/62.jpg)
USC 2001 C. Faloutsos 62
Fractals <-> Power laws
self-similarity ->• <=> fractals • <=> scale-free• <=> power-laws (y=x^a, F=C*r^(-2))
log( r )
log(#pairs within <=r )
1.58
![Page 63: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/63.jpg)
USC 2001 C. Faloutsos 63
Bible
RANK-FREQUENCY plot: (in log-log scales)
Zipf’s (first) Law:
Zipf’s law
log(rank)
log(freq)
“the”
“and”
![Page 64: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/64.jpg)
USC 2001 C. Faloutsos 64
Zipf’s law
• similarly for first names (slope ~-1)
• last names (~ -0.7)
• etc
![Page 65: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/65.jpg)
USC 2001 C. Faloutsos 65
More power laws
• Energy of earthquakes (Gutenberg-Richter law) [simscience.org]
log(count)
magnitudeday
amplitude
![Page 66: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/66.jpg)
USC 2001 C. Faloutsos 66
<url, u-id, ....>
Web Site Traffic
log(freq)
log(count)
Zipf
Clickstream data
![Page 67: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/67.jpg)
USC 2001 C. Faloutsos 67
Lotka’s law
• library science (Lotka’s law of publication count); and citation counts: (citeseer.nj.nec.com 6/2001)
log(#citations)
log(count)
J. Ullman
![Page 68: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/68.jpg)
USC 2001 C. Faloutsos 68
Korcak’s law
Scandinavian lakes area vs complementary cumulative count (log-log axes)
log(count( >= area))
log(area)
![Page 69: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/69.jpg)
USC 2001 C. Faloutsos 69
More power laws: Korcak
Japan islands;
area vs cumulative count (log-log axes) log(area)
log(count( >= area))
![Page 70: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/70.jpg)
USC 2001 C. Faloutsos 70
(Korcak’s law: Aegean islands)
![Page 71: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/71.jpg)
USC 2001 C. Faloutsos 71
Olympic medals:
y = -0.9676x + 2.3054
R2 = 0.9458
0
0.5
1
1.5
2
2.5
0 0.5 1 1.5 2
Series1
Linear (Series1)
log rank
log(# medals)
USA
ChinaRussia
![Page 72: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/72.jpg)
USC 2001 C. Faloutsos 72
SALES data – store#96
# units sold
count of products
![Page 73: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/73.jpg)
USC 2001 C. Faloutsos 73
TELCO data
# of service units
count ofcustomers
![Page 74: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/74.jpg)
USC 2001 C. Faloutsos 74
More power laws on the Internet
degree vs rank, for Internet domains (log-log) [sigcomm99]
log(rank)
log(degree)
-0.82
![Page 75: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/75.jpg)
USC 2001 C. Faloutsos 75
Even more power laws:
• Income distribution (Pareto’s law);
• duration of UNIX jobs [Harchol-Balter] • Distribution of UNIX file sizes• Web graph [CLEVER-IBM; Barabasi]
![Page 76: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/76.jpg)
USC 2001 C. Faloutsos 76
Overall Conclusions:
‘Find similar/interesting things’ in multimedia databases
• Indexing: feature extraction (‘GEMINI’)– automatic feature extraction: FastMap– Relevance feedback: FALCON
![Page 77: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/77.jpg)
USC 2001 C. Faloutsos 77
Conclusions - cont’d
• New tools for Data Mining: Fractals/power laws:– appear everywhere– lead to skewed distributions (Gaussian,
Poisson, uniformity, independence)– ‘correlation integral’ for separability/cluster
detection– PFD for dimensionality reduction
![Page 78: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/78.jpg)
USC 2001 C. Faloutsos 78
Resources:
• Software and papers:– www.cs.cmu.edu/~christos– Fractal dimension (FracDim)– Separability (sigmod 2000, kdd2001)– Relevance feedback for query by content
(FALCON – vldb 2000)
![Page 79: Indexing and Data Mining in Multimedia Databases](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54c67abc4a7959b6298b463b/html5/thumbnails/79.jpg)
USC 2001 C. Faloutsos 79
Resources
• Manfred Schroeder “Chaos, Fractals and Power Laws”