statistical learning methods for useful links emerging...
TRANSCRIPT
2003-4-5
DASFAA 2003 Plenary Tutorial [email protected] all right reserved 1
3/27/2003 DASFAA Tutorial, Kyoto 1
Statistical Learning Methods for Emerging Database Applications
Edward ChangAssociate Professor,Electrical Engineering, UC Santa BarbaraCTO, VIMA Technologies
3/27/2003 DASFAA Tutorial, Kyoto 2
Useful Links
Related Publicationshttp://www-db.stanford.edu/~echang/
Software Free Trialhttp://www.imagebeagle.comLocate objectionable images on your hard drivesBefore your boss finds it!!!
3/27/2003 DASFAA Tutorial, Kyoto 3
Outline
Statistical LearningEmerging Applications Data CharacteristicsClassical ModelsKernel Methods
Linear Model ViewNearest Neighbor ViewGeometric View
Dimension Reduction Methods
3/27/2003 DASFAA Tutorial, Kyoto 4
Statistical Learning
Program the computers to learn!Computers improve performancewith experience at some taskExample:
Task: playing checkersPerformance: % games it winsExperience: expert players
3/27/2003 DASFAA Tutorial, Kyoto 5
Statistical Learning
Task Ŷ = f(U)Represented by some model(s)Implies hypothesis
PerformanceMeasured by error functions
Experience (L)Characterized by training data
Algorithm (Φ)3/27/2003 DASFAA Tutorial, Kyoto 6
Supervised Learning
X: DataU: Unlabeled pool L: Labeled pool
G: LabelsRegressionClassification
Φ: Learning algorithmf = Φ(L) Ŷ = f(U)
2003-4-5
DASFAA 2003 Plenary Tutorial [email protected] all right reserved 2
3/27/2003 DASFAA Tutorial, Kyoto 7
Learning Algorithms
Linear ModelK-NNNeural NetworksDecision TreesKernel MethodsEtc.
3/27/2003 DASFAA Tutorial, Kyoto 8
Classical Model
N:Number of training instancesN+, N-
D:DimensionalityN >> D N → ∞
E.g., PAC learnabilityN- ≈ N+
3/27/2003 DASFAA Tutorial, Kyoto 9
Emerging DB Applications
N < DN+ << N-
ExamplesInformation Retrieval with relevance feedbackGene Profiling
3/27/2003 DASFAA Tutorial, Kyoto 10
Image Retrieval Demo
N < DN < 50D = 150
N+ << N-
ACM SIGMOD 01; ACM MM 01,02; IEEE CVPR 03
3/27/2003 DASFAA Tutorial, Kyoto 11
SVMactive
3/27/2003 DASFAA Tutorial, Kyoto 12
SVMactive
2003-4-5
DASFAA 2003 Plenary Tutorial [email protected] all right reserved 3
3/27/2003 DASFAA Tutorial, Kyoto 13
SVMactive
3/27/2003 DASFAA Tutorial, Kyoto 14
SVMactive
3/27/2003 DASFAA Tutorial, Kyoto 15
Ranking
3/27/2003 DASFAA Tutorial, Kyoto 16
Gene Profiling ExampleN = 59 cases, D = 4026 genes
3/27/2003 DASFAA Tutorial, Kyoto 17
Outline
Statistical LearningEmerging Applications Data CharacteristicsClassical Models (Classification)Kernel Methods
Linear Model ViewNearest Neighbor ViewGeometric View
Dimension Reduction Methods
3/27/2003 DASFAA Tutorial, Kyoto 18
Linear Model
Y = β0 + ΣΣ βj Xj (j = 1 to p)Y = XTβRSS(β) = (y – Xβ)T(y – Xβ)
RSS: Residual Sum of Squareβ = (XTX)-1 XTy
2003-4-5
DASFAA 2003 Plenary Tutorial [email protected] all right reserved 4
3/27/2003 DASFAA Tutorial, Kyoto 19
Linear Model
3/27/2003 DASFAA Tutorial, Kyoto 20
Maximum Likelihood
Y = β0 + ΣΣ βj Xj (j = 1 to p)Y = XTβY = XTβ + ε
ε (noise signals) are independentε → N (0, ∂2)
P(y|βx) has a normal dist. withMean at y = βxVariance ∂2
3/27/2003 DASFAA Tutorial, Kyoto 21
Linear Model
P(y|βx) → N (0, ∂2) Training
Given (x1,y1) (x2,y2) … (xn,yn)Infer P(β | x1, x2,… xn, y1, y2,…yn )By Bayes rule, orMaximum Likelihood Estimate
3/27/2003 DASFAA Tutorial, Kyoto 22
Maximum Likelihood
For what β isP(y1, y2,…yn | x1, x2,… xn, β) maximized?ΠΠ P(yi|βxi) maximized? ΠΠ exp(-½(yi-βxi/∂)2) maximized?ΣΣ (-½(yi-βxi/∂)2 maximized?ΣΣ (yi-βxi)2 minimized?
3/27/2003 DASFAA Tutorial, Kyoto 23
Least Square Linear Model
Solution Method #1RSS(β) = (y – Xβ)T(y – Xβ)β = (XTX)-1 XTy
Solution Method #2 (for D > N)Gradient decentPerceptron
3/27/2003 DASFAA Tutorial, Kyoto 24
Other Linear Models
LDAFind the projection direction which minimizes the overlap for two Gaussian distributions
Separating Hyperplane
2003-4-5
DASFAA 2003 Plenary Tutorial [email protected] all right reserved 5
3/27/2003 DASFAA Tutorial, Kyoto 25
LDA
3/27/2003 DASFAA Tutorial, Kyoto 26
3/27/2003 DASFAA Tutorial, Kyoto 27
Separating Hyperplane
3/27/2003 DASFAA Tutorial, Kyoto 28
Separating Hyperplane
3/27/2003 DASFAA Tutorial, Kyoto 29
Maximum Margin Hyperplane
3/27/2003 DASFAA Tutorial, Kyoto 30
Linear Model Fits All Data?
2003-4-5
DASFAA 2003 Plenary Tutorial [email protected] all right reserved 6
3/27/2003 DASFAA Tutorial, Kyoto 31
How about Joining the Dots?
Y(x) = 1/k ΣΣ yi,
xi ∈Nk(x)K = 1
3/27/2003 DASFAA Tutorial, Kyoto 32
Linear Models
N ≥ DLeast SquareLDA
D > NPerceptronMaximum Hyperplane
3/27/2003 DASFAA Tutorial, Kyoto 33
Linear Model Fits All?
3/27/2003 DASFAA Tutorial, Kyoto 34
NN with k = 1
3/27/2003 DASFAA Tutorial, Kyoto 35
Nearest Neighbor
Four Things Make a Memory Based Learner
A distance functionK: number of neighbors to consider?A weighted function (optional)How to fit with the local points?
3/27/2003 DASFAA Tutorial, Kyoto 36
Problems
Fitting NoiseJagged Boundaries
2003-4-5
DASFAA 2003 Plenary Tutorial [email protected] all right reserved 7
3/27/2003 DASFAA Tutorial, Kyoto 37
Solutions
Fitting NoisePick a Larger K?
Jagged BoundariesIntroducing Kernel as a weighting function
3/27/2003 DASFAA Tutorial, Kyoto 38
NN with k = 15
3/27/2003 DASFAA Tutorial, Kyoto 39
NN
3/27/2003 DASFAA Tutorial, Kyoto 40
Solutions
Fitting NoisePick a larger K?
Jagged BoundariesIntroducing Kernel as a weighting function
3/27/2003 DASFAA Tutorial, Kyoto 41
Nearest Neighbor -> Kernel Method
Four Things Make a Memory Based Learner
A distance functionK: number of neighbors to consider? AllA weighted function: RBF kernelsHow to fit with the local points? Predict weights
3/27/2003 DASFAA Tutorial, Kyoto 42
Kernel Method
RBF Weighted FunctionKernel width holds the keyUse cross validation to find the “optimal” width
Fitting with the Local PointsWhere NN meets Linear Model
2003-4-5
DASFAA 2003 Plenary Tutorial [email protected] all right reserved 8
3/27/2003 DASFAA Tutorial, Kyoto 43
LM vs. NNLinear Model
f(x) is approximated by a global linear functionMore stable, less flexible
Nearest NeighborK-NN assumes f(x) is well approximated by a locally constant functionLess stable, more flexible
Between LM and NNThe other models…
3/27/2003 DASFAA Tutorial, Kyoto 44
Decision Theories
Bias & Variance TradeoffBayes PredictionVC DimensionalityPAC Learnability
3/27/2003 DASFAA Tutorial, Kyoto 45
Variance vs. Bias
MSE(x0) = ET [f(x0) – ŷ0]2
= ET[ŷ0 – ET(ŷ0)]2 + [ET(ŷ0)– f(x0)]2
Error = VarT(ŷ0) + Bias2(ŷ0)
3/27/2003 DASFAA Tutorial, Kyoto 46
Outline
Statistical LearningEmerging Applications Data CharacteristicsClassical Models (Classification)Kernel MethodsDimension Reduction Methods
3/27/2003 DASFAA Tutorial, Kyoto 47
Where Are We and Where Am I Heading To ?
LM and NNKernel Method of Three Views
LM viewNN viewGeometric view
3/27/2003 DASFAA Tutorial, Kyoto 48
Linear Model View
Y = β0 + ΣΣ β XSeparating Hyperplane
Max||β||=1 CSubject to yyii f(f(xxii) ) ≥≥ C, orC, oryyii ((β0 +β xi) ≥≥ CC
2003-4-5
DASFAA 2003 Plenary Tutorial [email protected] all right reserved 9
3/27/2003 DASFAA Tutorial, Kyoto 49
Separating Hyperplane
3/27/2003 DASFAA Tutorial, Kyoto 50
Separating Hyperplane
3/27/2003 DASFAA Tutorial, Kyoto 51
Maximum Margin Hyperplane
3/27/2003 DASFAA Tutorial, Kyoto 52
Classifier Margin
Margin Defined as with of the boundary before hitting a data object
Maximum MarginTends to minimize classification varianceNo formal theory for this yet
3/27/2003 DASFAA Tutorial, Kyoto 53
Separating Hyperplane
3/27/2003 DASFAA Tutorial, Kyoto 54
M’s Mathematical Representation
Plus-plane{x: wx+b = +1}
Minus-plane{x: wx+b = -1}
w ⊥ Plus-planew(u – v) = 0, if u and v on plus-plane
w ⊥ Minus-plane
2003-4-5
DASFAA 2003 Plenary Tutorial [email protected] all right reserved 10
3/27/2003 DASFAA Tutorial, Kyoto 55
Separating Hyperplane
3/27/2003 DASFAA Tutorial, Kyoto 56
M
Let x- be any point on minus-planeLet x+ be the closest plus-plane-point to x-
x+ = x- + λw, whyThe line (x+x-) ⊥ minus-plane
M = |x+ - x-|
3/27/2003 DASFAA Tutorial, Kyoto 57
M
1. wx- + b = -1 2. wx+ + b = 1 3. x+ = x- + λw 4. M = |x+ - x-|5. w(x- + λw) + b = 1 (from 2 & 3)6. wx- + b + λww = 17. λww = 2
3/27/2003 DASFAA Tutorial, Kyoto 58
M
1. λww = 22. λ = 2/ww3. M = |x+ - x-| = |λw| = λ|w| = 2/|w|
4. Max MGradient decent, simulated annealing, EM, Newton’s method?
3/27/2003 DASFAA Tutorial, Kyoto 59
Max M
Max M = 2/|w|Min |w|/2Min |w|2/2
subject to yi(xiw+b) ≥ 1i = 1,…,N
Quadratic criterion with linear inequality constraints
3/27/2003 DASFAA Tutorial, Kyoto 60
Max M
Min |w|2/2subject to yi(xiw+b) ≥ 1i = 1,…,N
Lp = minw,b |w|2/2 + ΣΣi=1..N αi[yi(xiw+b)-1]
w = ΣΣi=1..N αiyixi
0 = ΣΣi=1..N αiyi
2003-4-5
DASFAA 2003 Plenary Tutorial [email protected] all right reserved 11
3/27/2003 DASFAA Tutorial, Kyoto 61
Wolfe Dual
Ld = ΣΣi=1..N α - 1/2 ΣΣΣΣi,j=1..Nαiαjyiyjxixj
Subject to αi ≥ 0αi [yi(xiw+b)-1] = 0KKT conditions⌧αi > 0, yi(xiw+b) = 1 (Support Vectors)⌧αi = 0, yi(xiw+b) > 1
3/27/2003 DASFAA Tutorial, Kyoto 62
Class Predictionyyqq = = w xq + b
w = ΣΣi=1..N αiyixi
yyqq = sign(= sign(ΣΣi=1..N αiyi(xi ·Xq) + b)
3/27/2003 DASFAA Tutorial, Kyoto 63
Non-seperatable Classes
Soft Margin HyperplaneBasis Expansion
3/27/2003 DASFAA Tutorial, Kyoto 64
Non-separating Case
3/27/2003 DASFAA Tutorial, Kyoto 65
Soft Margin SVMs
Min |w|2/2subject to yi(xiw+b) ≥ 1i = 1,…,N
Min |w|2/2 + C ∑εi
xiw+b ≥ 1 - εi if yi = 1xiw+b ≤ -1 + εi if yi = -1εi ≥ 0
3/27/2003 DASFAA Tutorial, Kyoto 66
Non-separating Case
2003-4-5
DASFAA 2003 Plenary Tutorial [email protected] all right reserved 12
3/27/2003 DASFAA Tutorial, Kyoto 67
Wolfe Dual
Ld = ΣΣi=1..N α - 1/2 ΣΣΣΣi,j=1..Nαiαjyiyjxixj
Subject to C ≥ αi ≥ 0ΣΣ αiyi = 0KKT conditions
yyqq = = sign ((ΣΣi=1..N αiyi(xi ·Xq) + b)
3/27/2003 DASFAA Tutorial, Kyoto 68
Basis Function
3/27/2003 DASFAA Tutorial, Kyoto 69
Harder 1D Example
3/27/2003 DASFAA Tutorial, Kyoto 70
Basis Function
Φ(X) = (x, x2)
3/27/2003 DASFAA Tutorial, Kyoto 71
Harder 1D Example
3/27/2003 DASFAA Tutorial, Kyoto 72
Some Basis Functions
Φ(X) = ΣΣ γmhm(X) hm(X) Rp → R
Common FunctionsPolynomialRadial basis functionsSigmoid functions
2003-4-5
DASFAA 2003 Plenary Tutorial [email protected] all right reserved 13
3/27/2003 DASFAA Tutorial, Kyoto 73
Wolfe DualLd = ΣΣi=1..N α - 1/2 ΣΣΣΣi,j=1..Nαiαjyiyj Φ(xi)Φ (xj)Subject to
C ≥ αi ≥ 0ΣΣ αiyi = 0KKT conditions
yyqq = sign (= sign (ΣΣi=1..N αiyi(Φ(xi)·Φ(Xq)) + b)K(xi, xj) = Φ(xi)·Φ(Xj)
Kernel function!
3/27/2003 DASFAA Tutorial, Kyoto 74
Quadratic Basis Functions
Φ(X) = {1, xi, xi xj}, ij = 1..p(p+1)(p+2) termsP2 termsO(P2) computational cost
It is equivalent to (xixj+1)2
O(p) computational costTotal Cost
O(N2p)
3/27/2003 DASFAA Tutorial, Kyoto 75
Dot Product Saves the Day
O(N2p)Quadratic
O(N2p2)Cubic
O(N2p3)Quartic
O(N2p4)
3/27/2003 DASFAA Tutorial, Kyoto 76
Quiz
What is a polynomial kernel degree dfunction’s signature?(xixj+1)d
3/27/2003 DASFAA Tutorial, Kyoto 77
Nearest Neighbor View
Z, a set of zero mean jointly Gaussian random variables,
Each Zi corresponds to one example Xi
Cov (zi, zj) = K(xi, xj)yi, the lable of zi, +1 or -1
P(yi | zi) = σ(yi,zi)
3/27/2003 DASFAA Tutorial, Kyoto 78
Training Data
2003-4-5
DASFAA 2003 Plenary Tutorial [email protected] all right reserved 14
3/27/2003 DASFAA Tutorial, Kyoto 79
General Kernel Classifier [Jaakkola, etc. 99]
MAP Classification for xt
yt = sign (Σ αi yi K(xt,xi)) K(xi, xj) = Cov (zi, zj) (some similarity function)
Supervised Training: Compute αi Given X and y, andAn error function such as J(α) = - ½ Σ αi αj yi yj K(xi,xj) + Σ F(αi)
3/27/2003 DASFAA Tutorial, Kyoto 80
Leave One Out
3/27/2003 DASFAA Tutorial, Kyoto 81
SVMs yt = sign (Σ αi yi K(xt,xi))(yi xi) training data, αi nonnegative, and kernel K positive definiteαi is obtained by maximizing
J(α) = - ½ Σ αi αj yi yj K(xi,xj) + Σ F(αi)F(αi) = αi
αi ≥ 0, Σyiαi = 0
3/27/2003 DASFAA Tutorial, Kyoto 82
SVMs
3/27/2003 DASFAA Tutorial, Kyoto 83
Important Insight
K(xi, xj) = Cov (zi, zj) To design of a kernel is to design a similarity function that produces a positive definite covariance matrix on the training instances
3/27/2003 DASFAA Tutorial, Kyoto 84
Basis Function Selection
Three General ApproachesRestriction methods⌧Limit the class of functionsSelection methods⌧Scan the dictionary adaptively (Boosting)Regularization methods⌧Use the entire dictionary but restrict
coefficients (Ridge Regression)
2003-4-5
DASFAA 2003 Plenary Tutorial [email protected] all right reserved 15
3/27/2003 DASFAA Tutorial, Kyoto 85
Overfitting?
Probably NotBecause
N free parameters (not D)Maximizing margin
3/27/2003 DASFAA Tutorial, Kyoto 86
Geometrical View
S = w X + b|w| = 1, b = 0V = {w | Si f(xi) > 0; i = 1..n, |w| = 1}SVM is the center of the largest sphere contained in V
3/27/2003 DASFAA Tutorial, Kyoto 87
SVMs
3/27/2003 DASFAA Tutorial, Kyoto 88
BPMs
Bayes Objective FunctionŜt = Bayes Z (Xt) = argmin Si in S E H|Z = x [l(H(x), Si)]
BPMs [Herbrich, etc. 2001]Abp= argmin h in H Ex[E H|Z = x [l(H(x), h(x))]]
3/27/2003 DASFAA Tutorial, Kyoto 89
BPMs
Linear ClassifierInput X Posses Spherical Gaussian Density
BP is the Center of Mass of the Version Space
3/27/2003 DASFAA Tutorial, Kyoto 90
BPMs vs. SVMs
2003-4-5
DASFAA 2003 Plenary Tutorial [email protected] all right reserved 16
3/27/2003 DASFAA Tutorial, Kyoto 91
BPMs
Use SVMs to find a good h in HFind the BP
Billiard Algorithm [Herbrich, etc. 2001]
Perceptron Algorithm [Herbrich, etc. 2001]
3/27/2003 DASFAA Tutorial, Kyoto 92
Billiard Ball Algorithm (R. Herbrich )
3/27/2003 DASFAA Tutorial, Kyoto 93
Outline
Statistical LearningEmerging Applications Data CharacteristicsClassical Models (Classification)Kernel MethodsDimension Reduction Methods
3/27/2003 DASFAA Tutorial, Kyoto 94
Dimensionality Curse
D: Data DimensionWhen D increases
Nearest neighbors are not localAll points are equally distanced
3/27/2003 DASFAA Tutorial, Kyoto 95
Sparse High-D Space [C. Aggarwal, etc. ICDT 2001]
Hyper-cube Range Queries
dd ssP =][3/27/2003 DASFAA Tutorial, Kyoto 96
2003-4-5
DASFAA 2003 Plenary Tutorial [email protected] all right reserved 17
3/27/2003 DASFAA Tutorial, Kyoto 97
Sparse High-D Space
Spherical Range Queries
3/27/2003 DASFAA Tutorial, Kyoto 98
)12(
)5.0()]5.0,([+Γ
•=∈ dQspRP
ddd π
3/27/2003 DASFAA Tutorial, Kyoto 99 3/27/2003 DASFAA Tutorial, Kyoto 100
Dimensionality Curse
3/27/2003 DASFAA Tutorial, Kyoto 101 3/27/2003 DASFAA Tutorial, Kyoto 102
So?
Is nearest neighbor estimate cursed in high-D spaces?
Yes!When D is large and N is relatively small, the estimate is off!!
2003-4-5
DASFAA 2003 Plenary Tutorial [email protected] all right reserved 18
3/27/2003 DASFAA Tutorial, Kyoto 103
Are We Doomed?
How does the curse affect classification?Similar objects tend to clustertogetherClassification makes binary prediction
3/27/2003 DASFAA Tutorial, Kyoto 104
Distribution of Distances
3/27/2003 DASFAA Tutorial, Kyoto 105
Some Solutions to High-D
Restricted Estimators Specifying the nature of local neighborhood
Adaptive Feature Reduction PCA, LDA
Dynamic Partial Function
3/27/2003 DASFAA Tutorial, Kyoto 106
Three Major Paradigms
Preserve data description in a lower dimensional space
PCAMaximize discriminability in a lower dimensional space
LDAActivate only similar channels
DPF
3/27/2003 DASFAA Tutorial, Kyoto 107
Minkowski Distance
Objects P and QD = (ΣM (pi - qi)n)1/n
Similar images are similar in all M features
3/27/2003 DASFAA Tutorial, Kyoto 108
1.0E-06
1.0E-05
1.0E-04
1.0E-03
1.0E-02
1.0E-01
00.
060.
130.
190.
250.
320.
380.
440.
510.
570.
630.
690.
760.
820.
880.
95
Feature Distance
Freq
uenc
y
1.0E-06
1.0E-05
1.0E-04
1.0E-03
1.0E-02
1.0E-01
00.
060.
130.
190.
250.
320.
380.
440.
510.
570.
630.
690.
760.
820.
880.
95
Feature Distance
Freq
uenc
y
2003-4-5
DASFAA 2003 Plenary Tutorial [email protected] all right reserved 19
3/27/2003 DASFAA Tutorial, Kyoto 109
Weighted Minkowski Distance
D = (ΣM wi(pi - qi)n)1/n
Similar images are similar in the same subset of the M features
3/27/2003 DASFAA Tutorial, Kyoto 110
0 0 0 0 0 0 00 0 0 0 0 0 00 0 0 0 0 0 00 0 0 0 0 0 00 0 0 0 0 0 00 0 0 0 0 0 00 0 0 0 0 0 00 0 0 0 0 0 00 0 0 0 0 0 0
0.007545 0.01307 0.004637 0.002413 0.002635 0.002954 0.0020070.014669 0.02717 0.010578 0.006734 0.007725 0.006379 0.0057660.012615 0.023055 0.009333 0.006764 0.007363 0.006593 0.0054430.082128 0.212612 0.068016 0.037835 0.032241 0.018068 0.0132030.061564 0.176548 0.045542 0.026445 0.026374 0.018583 0.0220370.019243 0.037016 0.015684 0.010834 0.012792 0.013536 0.0093460.09418 0.153677 0.066896 0.040249 0.036368 0.030341 0.0211380.1284 0.335405 0.13774 0.072613 0.054947 0.039216 0.043319
0.041414 0.101403 0.035881 0.022633 0.018991 0.017131 0.019450.014024 0.049782 0.01457 0.0053 0.004439 0.003041 0.0052260.049319 0.120274 0.045804 0.020165 0.019499 0.013805 0.018513
GIF
00.020.040.060.080.1
0.120.14
1 11 21 31 41 51 61 71 81 91 101
111
121
131
141
Feature Number
Aver
age
Dis
tanc
e
0 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 0
0.002923 0.004377 0.029086 0.017063 0.007649 0.002019 0.001984 0.011560.006648 0.010143 0.070708 0.046142 0.023502 0.005178 0.005169 0.030140.006298 0.009264 0.075118 0.042225 0.020053 0.006285 0.006533 0.0300430.010198 0.056025 0.052869 0.033199 0.018294 0.00688 0.006858 0.023620.017066 0.047514 0.104013 0.073459 0.037468 0.013849 0.01293 0.0483440.008148 0.015337 0.074134 0.044238 0.021222 0.005197 0.005099 0.0299780.013529 0.051743 0.063263 0.038084 0.020885 0.010481 0.009844 0.0285110.045746 0.104141 0.145924 0.11276 0.065015 0.026333 0.02593 0.0751920.026167 0.034522 0.085067 0.054154 0.02918 0.015887 0.014371 0.0397320.002676 0.012148 0.008913 0.004682 0.002452 0.000913 0.000905 0.0035730.014527 0.036084 0.046779 0.024712 0.017418 0.004182 0.004991 0.0196160.012121 0.030269 0.045198 0.022268 0.012468 0.004706 0.004955 0.017919
Scale up/down
00.050.1
0.150.2
0.250.3
0.350.4
1 11 21 31 41 51 61 71 81 91 101
111
121
131
141
Feature Number
Aver
age
Dis
tanc
e
0.024788 0.069615 0.0226 0.009364 0.01 0.00678 0.0097120.094781 0.227558 0.099002 0.046466 0.047815 0.036883 0.0246990.093399 0.233519 0.188091 0.043026 0.037991 0.022151 0.0240640.040228 0.102763 0.034949 0.014184 0.01465 0.010237 0.0155170.001163 0.000896 0.000722 0.000627 0.000349 0.000452 0.0027580.006947 0.006769 0.003541 0.006377 0.002048 0.005515 0.0130060.006365 0.005313 0.002064 0.004006 0.002055 0.003338 0.01010.011705 0.010935 0.006615 0.007506 0.003319 0.005911 0.0152110.009434 0.010169 0.004484 0.006306 0.002582 0.004798 0.0136570.006305 0.005997 0.003392 0.005719 0.002382 0.004853 0.0128020.005835 0.00945 0.004323 0.00564 0.002688 0.004535 0.0063320.008149 0.009636 0.0047 0.006213 0.002564 0.003375 0.0064210.006776 0.010315 0.005393 0.008004 0.003845 0.005659 0.0132030.001526 0.002551 0.000576 0.000371 0.000331 0.000286 0.000380.016302 0.022657 0.007055 0.00353 0.002171 0.004162 0.003980.012414 0.020159 0.007076 0.003102 0.00188 0.004606 0.003490.007231 0.013591 0.004979 0.001092 0.000582 0.002766 0.0007410.011588 0.015102 0.005764 0.003855 0.00262 0.004584 0.0037920.01212 0.016013 0.006441 0.004048 0.002728 0.004856 0.004241
0.012235 0.01671 0.00483 0.002616 0.00197 0.00268 0.001672
Cropping
00.05
0.10.15
0.20.25
0.30.35
1 11 21 31 41 51 61 71 81 91 101
111
121
131
141
Feature Number
Aver
age
Dis
tanc
e
0.006109 0.019169 0.032795 0.015229 0.008667 0.002357 0.00292 0.0123940.01223 0.070665 0.046472 0.02549 0.017445 0.008694 0.00841 0.021302
0.019067 0.08113 0.04592 0.024327 0.014169 0.004995 0.005275 0.0189370.011323 0.029089 0.063856 0.037716 0.01988 0.00522 0.005556 0.0264460.000995 0.000971 0.00241 0.001415 0.000736 0.000275 0.000272 0.0010220.007103 0.006337 0.015615 0.008709 0.003433 0.001572 0.002071 0.006280.004321 0.004457 0.012494 0.007507 0.003403 0.001351 0.001976 0.0053460.007451 0.008135 0.017145 0.008711 0.003192 0.001154 0.00223 0.0064860.00576 0.006822 0.015235 0.00869 0.003676 0.001193 0.002159 0.006191
0.006491 0.005948 0.013473 0.007436 0.003165 0.001777 0.002377 0.0056460.003832 0.005257 0.011884 0.008077 0.002654 0.001227 0.001213 0.0050110.004812 0.005389 0.011737 0.00729 0.003216 0.001534 0.002039 0.0051630.008795 0.007888 0.016303 0.008801 0.004048 0.002367 0.0027 0.0068440.000451 0.000707 0.002277 0.001346 0.000797 0.000253 0.000239 0.0009820.004914 0.006924 0.01499 0.009123 0.006657 0.003364 0.003391 0.0075050.004473 0.006398 0.017247 0.008858 0.005219 0.002338 0.002392 0.0072110.001723 0.003639 0.010426 0.005216 0.003024 0.00043 0.000423 0.0039040.00427 0.005712 0.011221 0.00856 0.006923 0.004464 0.004462 0.007126
0.004978 0.006186 0.009864 0.007161 0.005881 0.003835 0.003847 0.0061180.001722 0.0046 0.015611 0.007291 0.00338 0.000508 0.00049 0.005456
Rotation
0
0.02
0.04
0.06
0.08
0.1
0.12
1 10 19 28 37 46 55 64 73 82 91 100
109
118
127
136
Feature Number
Ave
rage
Dis
tanc
e
3/27/2003 DASFAA Tutorial, Kyoto 111
Similarity Theories
Objects are similar in all respects (Richardson 1928)Objects are similar in some respects (Tversky 1977)Similarity is a process of determining respects, rather than using predefined respects (Goldstone 94)
3/27/2003 DASFAA Tutorial, Kyoto 112
DPF
Which Place is Similar to Kyoto?PartialDynamicDynamic Partial Function
3/27/2003 DASFAA Tutorial, Kyoto 113
Precision/Recall
3/27/2003 DASFAA Tutorial, Kyoto 114
Summary
Statistical LearningEmerging Applications Data CharacteristicsClassical Models (Classification)Kernel Methods
Linear Model ViewNearest Neighbor ViewGeometric View
Dimension Reduction Methods
2003-4-5
DASFAA 2003 Plenary Tutorial [email protected] all right reserved 20
3/27/2003 DASFAA Tutorial, Kyoto 115
Emerging DB Applications
N < DN+ << N-
ExamplesInformation Retrieval with relevance feedbackGene ProfilingBioinformatics
3/27/2003 DASFAA Tutorial, Kyoto 116
Useful Links
Related Publicationshttp://www-db.stanford.edu/~echang/
Software Free Trialhttp://www.imagebeagle.comLocate objectionable images on your hard drivesBefore your boss finds it!!!
3/27/2003 DASFAA Tutorial, Kyoto 117
References1. The Elements of Statistical Learning, T. Hastie, R. Tibshirani, and J.
Friedman, Springer, N.Y., 20012. Machine Learning, T. Mitchell, 19973. High-dimensional Data Analysis, D. Donoho, American Math. Society Lecture,
20004. Support Vector Machine Active Learning for Image Retrieval, S. Tong and E.
Chang, ACM MM, 20015. Dynamic Partial Function, B. Li and E. Chang, ACM Journal, 20036. Pattern Discovery in Sequences under a Markov Assumption, D. Chudova and
P. Smyth, ACM KDD 20027. Bayes Point Machines, R. Herbrich, T. Graepel and C. Campbell, Journal of
Machine Learning Research, 20018. The Nature of Statistical Learning Theory, V. Vapnik, Springer, N.Y., 19959. Probabilistic Kernel Regression Models, T. Jaakkola and D. Haussler,
Conference of AI and Statistics, 199910. Support Vector Machines, Lecture Notes, A. Moore, CMU11. On the Surprising Behavior of Distance Metrics in High-dimensional Space, C.
Aggarwal, A. Hinneburg, and D. Keim, ICDT 2001