computational odels for data analysisagiordani/cmda/5-kernels-methods.pdf · relational learning...
TRANSCRIPT
![Page 1: COMPUTATIONAL ODELS FOR DATA ANALYSISagiordani/CMDA/5-Kernels-Methods.pdf · Relational Learning from Texts , Proceedings of The 24th Annual International Conference on Machine Learning](https://reader033.vdocuments.mx/reader033/viewer/2022060223/5f07ec747e708231d41f7063/html5/thumbnails/1.jpg)
COMPUTATIONAL MODELS FOR
DATA ANALYSIS
COMPUTATIONAL MODELS FOR
DATA ANALYSIS
Kernel MethodsKernel Methods
Alessandra Giordani
Department of information and communication technology
University of Trento Email: [email protected]
Kernel MethodsKernel Methods
![Page 2: COMPUTATIONAL ODELS FOR DATA ANALYSISagiordani/CMDA/5-Kernels-Methods.pdf · Relational Learning from Texts , Proceedings of The 24th Annual International Conference on Machine Learning](https://reader033.vdocuments.mx/reader033/viewer/2022060223/5f07ec747e708231d41f7063/html5/thumbnails/2.jpg)
Linear ClassifierLinear Classifier
f (r x ) =
r x ⋅
r w + b = 0,
r x ,r w ∈ ℜn,b ∈ ℜ
The equation of a hyperplane is
is the vector representing the classifying example
is the gradient of the hyperplane
xr
wr
is the gradient of the hyperplane
The classification function is
w
( ) sign( ( ))h x f x=
![Page 3: COMPUTATIONAL ODELS FOR DATA ANALYSISagiordani/CMDA/5-Kernels-Methods.pdf · Relational Learning from Texts , Proceedings of The 24th Annual International Conference on Machine Learning](https://reader033.vdocuments.mx/reader033/viewer/2022060223/5f07ec747e708231d41f7063/html5/thumbnails/3.jpg)
Mapping vectors in a space where they are linearly
separable
The main idea of Kernel FunctionsThe main idea of Kernel Functions
)(xxrr
φ→
)x(φ)x(φ
φ
x
x
x
x
o
o
o
o
)x(φ
)x(φ
)x(φ
)x(φ
)(oφ
)(oφ
)(oφ)(oφ
![Page 4: COMPUTATIONAL ODELS FOR DATA ANALYSISagiordani/CMDA/5-Kernels-Methods.pdf · Relational Learning from Texts , Proceedings of The 24th Annual International Conference on Machine Learning](https://reader033.vdocuments.mx/reader033/viewer/2022060223/5f07ec747e708231d41f7063/html5/thumbnails/4.jpg)
A mapping exampleA mapping example
Given two masses m1 and m2 , one is constrained
Apply a force fa to the mass m1
Experiments
Features m1 , m2 and fa
We want to learn a classifier that tells when a mass m will We want to learn a classifier that tells when a mass m1 will get far away from m2
2
2121 ),,(
r
mmCrmmf =
If we consider the Gravitational Newton Law
we need to find when f(m1 , m2 , r) < fa
![Page 5: COMPUTATIONAL ODELS FOR DATA ANALYSISagiordani/CMDA/5-Kernels-Methods.pdf · Relational Learning from Texts , Proceedings of The 24th Annual International Conference on Machine Learning](https://reader033.vdocuments.mx/reader033/viewer/2022060223/5f07ec747e708231d41f7063/html5/thumbnails/5.jpg)
A mapping example (2)A mapping example (2)
))(),...,(()(),...,( 11 xxxxxx nn
rrrrφφφ =→=
The gravitational law is not linear so we need to change space
)ln,ln,ln,(ln),,,(),,,( 2121 rmmfzyxkrmmf aa =→
As
zyxcrmmCrmmf 2ln2lnlnln),,(ln 2121 −++=−++=
(ln m1,ln m2,-2ln r)⋅ (x,y,z)- ln fa + ln C = 0, we can decide
without error if the mass will get far away or not
As
0lnln2lnlnln 21 =−+−− Crmmfa
We need the hyperplane
![Page 6: COMPUTATIONAL ODELS FOR DATA ANALYSISagiordani/CMDA/5-Kernels-Methods.pdf · Relational Learning from Texts , Proceedings of The 24th Annual International Conference on Machine Learning](https://reader033.vdocuments.mx/reader033/viewer/2022060223/5f07ec747e708231d41f7063/html5/thumbnails/6.jpg)
A kernel-based MachinePerceptron trainingA kernel-based MachinePerceptron training
r w
0←
r 0 ;b
0← 0;k ← 0;R← max
1≤ i≤ l||r x
i||
do
for i = 1 to l
if yi(r w
k⋅r x
i+ b
k) ≤ 0 then
r w
k+1=r w
k+ ηy
i
r x
i
w k+1
= w k
+ ηyix
i
bk+1
= bk
+ ηyiR2
k = k + 1
endif
endfor
while an error is found
return k,(r w
k,b
k)
![Page 7: COMPUTATIONAL ODELS FOR DATA ANALYSISagiordani/CMDA/5-Kernels-Methods.pdf · Relational Learning from Texts , Proceedings of The 24th Annual International Conference on Machine Learning](https://reader033.vdocuments.mx/reader033/viewer/2022060223/5f07ec747e708231d41f7063/html5/thumbnails/7.jpg)
Kernel Function DefinitionKernel Function Definition
Kernels are the product of mapping functions such as
r x ∈ ℜn
, r φ (
r x ) = (φ1(
r x ),φ2(
r x ),...,φm(
r x )) ∈ ℜm
![Page 8: COMPUTATIONAL ODELS FOR DATA ANALYSISagiordani/CMDA/5-Kernels-Methods.pdf · Relational Learning from Texts , Proceedings of The 24th Annual International Conference on Machine Learning](https://reader033.vdocuments.mx/reader033/viewer/2022060223/5f07ec747e708231d41f7063/html5/thumbnails/8.jpg)
The Kernel Gram MatrixThe Kernel Gram Matrix
With KM-based learning, the sole information used from
the training data set is the Kernel Gram Matrix
k(x1, x1) k(x1, x2 ) ... k(x1, xm)
k(x , x ) k(x , x ) ... k(x , x )
If the kernel is valid, K is symmetric definite-positive .
Ktraining =k(x2, x1) k(x2, x2 ) ... k(x2, xm)
... ... ... ...
k(xm, x1) k(xm, x2 ) ... k(xm, xm)
![Page 9: COMPUTATIONAL ODELS FOR DATA ANALYSISagiordani/CMDA/5-Kernels-Methods.pdf · Relational Learning from Texts , Proceedings of The 24th Annual International Conference on Machine Learning](https://reader033.vdocuments.mx/reader033/viewer/2022060223/5f07ec747e708231d41f7063/html5/thumbnails/9.jpg)
Valid KernelsValid Kernels
![Page 10: COMPUTATIONAL ODELS FOR DATA ANALYSISagiordani/CMDA/5-Kernels-Methods.pdf · Relational Learning from Texts , Proceedings of The 24th Annual International Conference on Machine Learning](https://reader033.vdocuments.mx/reader033/viewer/2022060223/5f07ec747e708231d41f7063/html5/thumbnails/10.jpg)
Mercer’s conditionMercer’s condition
If the Gram matrix:
is positive semi-definite there is a mapping φ that
produces the target kernel function
), ji xxkGrr
(=
![Page 11: COMPUTATIONAL ODELS FOR DATA ANALYSISagiordani/CMDA/5-Kernels-Methods.pdf · Relational Learning from Texts , Proceedings of The 24th Annual International Conference on Machine Learning](https://reader033.vdocuments.mx/reader033/viewer/2022060223/5f07ec747e708231d41f7063/html5/thumbnails/11.jpg)
Mercer’s Theorem (finite space)Mercer’s Theorem (finite space)
Let us consider
K symmetric ⇒∃V: for Takagi factorization of a
complex-symmetric matrix, where:
Λ is the diagonal matrix of the eigenvalues λt of K
are the eigenvectors, i.e. the columns of V
Let us assume lambda values non-negative
![Page 12: COMPUTATIONAL ODELS FOR DATA ANALYSISagiordani/CMDA/5-Kernels-Methods.pdf · Relational Learning from Texts , Proceedings of The 24th Annual International Conference on Machine Learning](https://reader033.vdocuments.mx/reader033/viewer/2022060223/5f07ec747e708231d41f7063/html5/thumbnails/12.jpg)
Mercer’s Theorem(sufficient conditions)Mercer’s Theorem(sufficient conditions)
Φ(r x i ) ⋅ Φ(
r x j ) = λtvti
t=1
n
∑ vtj = VΛ ′ V ( )ij
= K ij = K(r x i ,
r x j )
Therefore
,
which implies that K is a kernel function which implies that K is a kernel function
![Page 13: COMPUTATIONAL ODELS FOR DATA ANALYSISagiordani/CMDA/5-Kernels-Methods.pdf · Relational Learning from Texts , Proceedings of The 24th Annual International Conference on Machine Learning](https://reader033.vdocuments.mx/reader033/viewer/2022060223/5f07ec747e708231d41f7063/html5/thumbnails/13.jpg)
Mercer’s Theorem(necessary conditions)Mercer’s Theorem(necessary conditions)
Suppose we have negative eigenvalues �s and
eigenvectors the following point
has the following norm:
this contradicts the geometry of the space.
![Page 14: COMPUTATIONAL ODELS FOR DATA ANALYSISagiordani/CMDA/5-Kernels-Methods.pdf · Relational Learning from Texts , Proceedings of The 24th Annual International Conference on Machine Learning](https://reader033.vdocuments.mx/reader033/viewer/2022060223/5f07ec747e708231d41f7063/html5/thumbnails/14.jpg)
Is it a valid kernel?Is it a valid kernel?
It may not be a kernel so we can use M´́́́·M
![Page 15: COMPUTATIONAL ODELS FOR DATA ANALYSISagiordani/CMDA/5-Kernels-Methods.pdf · Relational Learning from Texts , Proceedings of The 24th Annual International Conference on Machine Learning](https://reader033.vdocuments.mx/reader033/viewer/2022060223/5f07ec747e708231d41f7063/html5/thumbnails/15.jpg)
Valid Kernel operationsValid Kernel operations
k(x,z) = k1(x,z)+k2(x,z)
k(x,z) = k1(x,z)*k2(x,z)
k(x,z) = α k1(x,z)
k(x,z) = f(x)f(z)
k(x,z) = k1(φ(x),φ(z))
k(x,z) = x'Bz
![Page 16: COMPUTATIONAL ODELS FOR DATA ANALYSISagiordani/CMDA/5-Kernels-Methods.pdf · Relational Learning from Texts , Proceedings of The 24th Annual International Conference on Machine Learning](https://reader033.vdocuments.mx/reader033/viewer/2022060223/5f07ec747e708231d41f7063/html5/thumbnails/16.jpg)
Basic Kernels for unstructured dataBasic Kernels for unstructured data
Linear Kernel
Polynomial Kernel
Lexical kernel
String Kernel
![Page 17: COMPUTATIONAL ODELS FOR DATA ANALYSISagiordani/CMDA/5-Kernels-Methods.pdf · Relational Learning from Texts , Proceedings of The 24th Annual International Conference on Machine Learning](https://reader033.vdocuments.mx/reader033/viewer/2022060223/5f07ec747e708231d41f7063/html5/thumbnails/17.jpg)
Linear KernelLinear Kernel
In Text Categorization documents are word vectors
The dot product counts the number of features in
common
This provides a sort of similarity
![Page 18: COMPUTATIONAL ODELS FOR DATA ANALYSISagiordani/CMDA/5-Kernels-Methods.pdf · Relational Learning from Texts , Proceedings of The 24th Annual International Conference on Machine Learning](https://reader033.vdocuments.mx/reader033/viewer/2022060223/5f07ec747e708231d41f7063/html5/thumbnails/18.jpg)
Feature Conjunction (polynomial Kernel)Feature Conjunction (polynomial Kernel)
The initial vectors are mapped in a higher space
More expressive, as encodes
Stock+Market vs. Downtown+Market features
)1,2,2,2,,(),( 2121
2
2
2
121 xxxxxxxx →><Φ
)( 21xx
Stock+Market vs. Downtown+Market features
We can smartly compute the scalar product as
),()1()1(
1222
)1,2,2,2,,()1,2,2,2,,(
)()(
22
2211
22112121
2
2
2
2
2
1
2
1
2121
2
2
2
12121
2
2
2
1
zxKzxzxzx
zxzxzzxxzxzx
zzzzzzxxxxxx
zx
Poly
rrrr
rr
=+⋅=++=
=+++++=
=⋅=
=Φ⋅Φ
![Page 19: COMPUTATIONAL ODELS FOR DATA ANALYSISagiordani/CMDA/5-Kernels-Methods.pdf · Relational Learning from Texts , Proceedings of The 24th Annual International Conference on Machine Learning](https://reader033.vdocuments.mx/reader033/viewer/2022060223/5f07ec747e708231d41f7063/html5/thumbnails/19.jpg)
Document SimilarityDocument Similarity
industrycompany
Doc 1 Doc 2
telephone
market
product
![Page 20: COMPUTATIONAL ODELS FOR DATA ANALYSISagiordani/CMDA/5-Kernels-Methods.pdf · Relational Learning from Texts , Proceedings of The 24th Annual International Conference on Machine Learning](https://reader033.vdocuments.mx/reader033/viewer/2022060223/5f07ec747e708231d41f7063/html5/thumbnails/20.jpg)
Lexical Semantic Kernel [CoNLL 2005]Lexical Semantic Kernel [CoNLL 2005]
The document similarity is the SK function:
SK (d1,d
2) = s(w
1,w
2)
w1 ∈d1 ,w2 ∈d2
∑
where s is any similarity function between words, e.g.
WordNet [Basili et al.,2005] similarity or LSA [Cristianini et
al., 2002]
Good results when training data is small
w1 ∈d1 ,w2 ∈d2
![Page 21: COMPUTATIONAL ODELS FOR DATA ANALYSISagiordani/CMDA/5-Kernels-Methods.pdf · Relational Learning from Texts , Proceedings of The 24th Annual International Conference on Machine Learning](https://reader033.vdocuments.mx/reader033/viewer/2022060223/5f07ec747e708231d41f7063/html5/thumbnails/21.jpg)
Using character sequencesUsing character sequences
bank ank bnk bk b
counts the number of common substrings
rank ank rnk rk r
![Page 22: COMPUTATIONAL ODELS FOR DATA ANALYSISagiordani/CMDA/5-Kernels-Methods.pdf · Relational Learning from Texts , Proceedings of The 24th Annual International Conference on Machine Learning](https://reader033.vdocuments.mx/reader033/viewer/2022060223/5f07ec747e708231d41f7063/html5/thumbnails/22.jpg)
String KernelString Kernel
Given two strings, the number of matches between their
substrings is evaluated
E.g. Bank and Rank
B, a, n, k, Ba, Ban, Bank, Bk, an, ank, nk,..B, a, n, k, Ba, Ban, Bank, Bk, an, ank, nk,..
R, a , n , k, Ra, Ran, Rank, Rk, an, ank, nk,..
String kernel over sentences and texts
Huge space but there are efficient algorithms
![Page 23: COMPUTATIONAL ODELS FOR DATA ANALYSISagiordani/CMDA/5-Kernels-Methods.pdf · Relational Learning from Texts , Proceedings of The 24th Annual International Conference on Machine Learning](https://reader033.vdocuments.mx/reader033/viewer/2022060223/5f07ec747e708231d41f7063/html5/thumbnails/23.jpg)
Formal DefinitionFormal Definition
, where i1 +1
, where
![Page 24: COMPUTATIONAL ODELS FOR DATA ANALYSISagiordani/CMDA/5-Kernels-Methods.pdf · Relational Learning from Texts , Proceedings of The 24th Annual International Conference on Machine Learning](https://reader033.vdocuments.mx/reader033/viewer/2022060223/5f07ec747e708231d41f7063/html5/thumbnails/24.jpg)
Kernel between Bank and RankKernel between Bank and Rank
![Page 25: COMPUTATIONAL ODELS FOR DATA ANALYSISagiordani/CMDA/5-Kernels-Methods.pdf · Relational Learning from Texts , Proceedings of The 24th Annual International Conference on Machine Learning](https://reader033.vdocuments.mx/reader033/viewer/2022060223/5f07ec747e708231d41f7063/html5/thumbnails/25.jpg)
An example of string kernel computationAn example of string kernel computation
![Page 26: COMPUTATIONAL ODELS FOR DATA ANALYSISagiordani/CMDA/5-Kernels-Methods.pdf · Relational Learning from Texts , Proceedings of The 24th Annual International Conference on Machine Learning](https://reader033.vdocuments.mx/reader033/viewer/2022060223/5f07ec747e708231d41f7063/html5/thumbnails/26.jpg)
String Kernels for OCRString Kernels for OCR
![Page 27: COMPUTATIONAL ODELS FOR DATA ANALYSISagiordani/CMDA/5-Kernels-Methods.pdf · Relational Learning from Texts , Proceedings of The 24th Annual International Conference on Machine Learning](https://reader033.vdocuments.mx/reader033/viewer/2022060223/5f07ec747e708231d41f7063/html5/thumbnails/27.jpg)
Pixel Representation Pixel Representation
![Page 28: COMPUTATIONAL ODELS FOR DATA ANALYSISagiordani/CMDA/5-Kernels-Methods.pdf · Relational Learning from Texts , Proceedings of The 24th Annual International Conference on Machine Learning](https://reader033.vdocuments.mx/reader033/viewer/2022060223/5f07ec747e708231d41f7063/html5/thumbnails/28.jpg)
Sequence of bitsSequence of bits
L1 00011100. 00111100. 00101100. 00001100
11. 00001100
00001100L8 00001100
SK(ima,imb) = SK(Lai ,Lb
i )i=1..8
∑
![Page 29: COMPUTATIONAL ODELS FOR DATA ANALYSISagiordani/CMDA/5-Kernels-Methods.pdf · Relational Learning from Texts , Proceedings of The 24th Annual International Conference on Machine Learning](https://reader033.vdocuments.mx/reader033/viewer/2022060223/5f07ec747e708231d41f7063/html5/thumbnails/29.jpg)
ResultsResults
Using columns+rows+diagonals
![Page 30: COMPUTATIONAL ODELS FOR DATA ANALYSISagiordani/CMDA/5-Kernels-Methods.pdf · Relational Learning from Texts , Proceedings of The 24th Annual International Conference on Machine Learning](https://reader033.vdocuments.mx/reader033/viewer/2022060223/5f07ec747e708231d41f7063/html5/thumbnails/30.jpg)
Tree kernelsTree kernels
Subtree, Subset Tree, Partial Tree kernels
Efficient computation
![Page 31: COMPUTATIONAL ODELS FOR DATA ANALYSISagiordani/CMDA/5-Kernels-Methods.pdf · Relational Learning from Texts , Proceedings of The 24th Annual International Conference on Machine Learning](https://reader033.vdocuments.mx/reader033/viewer/2022060223/5f07ec747e708231d41f7063/html5/thumbnails/31.jpg)
Main Idea of Tree KernelsMain Idea of Tree Kernels
![Page 32: COMPUTATIONAL ODELS FOR DATA ANALYSISagiordani/CMDA/5-Kernels-Methods.pdf · Relational Learning from Texts , Proceedings of The 24th Annual International Conference on Machine Learning](https://reader033.vdocuments.mx/reader033/viewer/2022060223/5f07ec747e708231d41f7063/html5/thumbnails/32.jpg)
Example of a syntactic parse treeExample of a syntactic parse tree
“John delivers a talk in Rome”
S → N VPS
N VP
VP → V NP PP
PP → IN N
N → RomeN
Rome
N
NP
D N
VP
VJohn
in
delivers
a talk
PP
IN
![Page 33: COMPUTATIONAL ODELS FOR DATA ANALYSISagiordani/CMDA/5-Kernels-Methods.pdf · Relational Learning from Texts , Proceedings of The 24th Annual International Conference on Machine Learning](https://reader033.vdocuments.mx/reader033/viewer/2022060223/5f07ec747e708231d41f7063/html5/thumbnails/33.jpg)
The Syntactic Tree Kernel (STK) [Collins and Duffy, 2002]
The Syntactic Tree Kernel (STK) [Collins and Duffy, 2002]
NP
VP
V NP
VP
V NP
VP
V NP
VP
V NP
VP
V NP
D N
V
delivers
a talk
NP
D N
V
delivers
a
NP
D N
V
delivers
NP
D N
V NPV
![Page 34: COMPUTATIONAL ODELS FOR DATA ANALYSISagiordani/CMDA/5-Kernels-Methods.pdf · Relational Learning from Texts , Proceedings of The 24th Annual International Conference on Machine Learning](https://reader033.vdocuments.mx/reader033/viewer/2022060223/5f07ec747e708231d41f7063/html5/thumbnails/34.jpg)
The overall fragment setThe overall fragment set
![Page 35: COMPUTATIONAL ODELS FOR DATA ANALYSISagiordani/CMDA/5-Kernels-Methods.pdf · Relational Learning from Texts , Proceedings of The 24th Annual International Conference on Machine Learning](https://reader033.vdocuments.mx/reader033/viewer/2022060223/5f07ec747e708231d41f7063/html5/thumbnails/35.jpg)
The overall fragment setThe overall fragment set
NP
D
VP
aa
Children are not dividedChildren are not divided
![Page 36: COMPUTATIONAL ODELS FOR DATA ANALYSISagiordani/CMDA/5-Kernels-Methods.pdf · Relational Learning from Texts , Proceedings of The 24th Annual International Conference on Machine Learning](https://reader033.vdocuments.mx/reader033/viewer/2022060223/5f07ec747e708231d41f7063/html5/thumbnails/36.jpg)
Explicit kernel spaceExplicit kernel space
counts the number of common substructures
![Page 37: COMPUTATIONAL ODELS FOR DATA ANALYSISagiordani/CMDA/5-Kernels-Methods.pdf · Relational Learning from Texts , Proceedings of The 24th Annual International Conference on Machine Learning](https://reader033.vdocuments.mx/reader033/viewer/2022060223/5f07ec747e708231d41f7063/html5/thumbnails/37.jpg)
Efficient evaluation of the scalar productEfficient evaluation of the scalar product
![Page 38: COMPUTATIONAL ODELS FOR DATA ANALYSISagiordani/CMDA/5-Kernels-Methods.pdf · Relational Learning from Texts , Proceedings of The 24th Annual International Conference on Machine Learning](https://reader033.vdocuments.mx/reader033/viewer/2022060223/5f07ec747e708231d41f7063/html5/thumbnails/38.jpg)
Efficient evaluation of the scalar productEfficient evaluation of the scalar product
[Collins and Duffy, ACL 2002] evaluate ∆ in O(n2):[Collins and Duffy, ACL 2002] evaluate ∆ in O(n2):
![Page 39: COMPUTATIONAL ODELS FOR DATA ANALYSISagiordani/CMDA/5-Kernels-Methods.pdf · Relational Learning from Texts , Proceedings of The 24th Annual International Conference on Machine Learning](https://reader033.vdocuments.mx/reader033/viewer/2022060223/5f07ec747e708231d41f7063/html5/thumbnails/39.jpg)
SubTree (ST) Kernel [Vishwanathan and Smola, 2002]SubTree (ST) Kernel [Vishwanathan and Smola, 2002]
NP NP
VP
V
D N
a talk
D N
a talk
D N delivers
a talk
V
delivers
![Page 40: COMPUTATIONAL ODELS FOR DATA ANALYSISagiordani/CMDA/5-Kernels-Methods.pdf · Relational Learning from Texts , Proceedings of The 24th Annual International Conference on Machine Learning](https://reader033.vdocuments.mx/reader033/viewer/2022060223/5f07ec747e708231d41f7063/html5/thumbnails/40.jpg)
EvaluationEvaluation
Given the equation for STK
![Page 41: COMPUTATIONAL ODELS FOR DATA ANALYSISagiordani/CMDA/5-Kernels-Methods.pdf · Relational Learning from Texts , Proceedings of The 24th Annual International Conference on Machine Learning](https://reader033.vdocuments.mx/reader033/viewer/2022060223/5f07ec747e708231d41f7063/html5/thumbnails/41.jpg)
SVM-light-TK SoftwareSVM-light-TK Software
Encodes ST, STK and combination kernels
in SVM-light [Joachims, 1999]
Available at http://dit.unitn.it/~moschitt/
Tree forests, vector setsTree forests, vector sets
The new SVM-Light-TK toolkit will be released asap
(email me to have the current version)
![Page 42: COMPUTATIONAL ODELS FOR DATA ANALYSISagiordani/CMDA/5-Kernels-Methods.pdf · Relational Learning from Texts , Proceedings of The 24th Annual International Conference on Machine Learning](https://reader033.vdocuments.mx/reader033/viewer/2022060223/5f07ec747e708231d41f7063/html5/thumbnails/42.jpg)
Practical Example on Question ClassificationPractical Example on Question Classification
Definition: What does HTML stand for?
Description: What's the final line in the Edgar Allan Poe
poem "The Raven"?
Entity: What foods can cause allergic reaction in people?
Human: Who won the Nobel Peace Prize in 1992?
Location: Where is the Statue of Liberty?
Manner: How did Bob Marley die?
Numeric: When was Martin Luther King Jr. born?
Organization: What company makes Bentley cars?
![Page 43: COMPUTATIONAL ODELS FOR DATA ANALYSISagiordani/CMDA/5-Kernels-Methods.pdf · Relational Learning from Texts , Proceedings of The 24th Annual International Conference on Machine Learning](https://reader033.vdocuments.mx/reader033/viewer/2022060223/5f07ec747e708231d41f7063/html5/thumbnails/43.jpg)
ConclusionsConclusions
Dealing with noisy and errors of NLP modules require
robust approaches
SVMs are robust to noise and Kernel methods allows for:
Syntactic information via STK
Shallow Semantic Information via PTKShallow Semantic Information via PTK
Word/POS sequences via String Kernels
When the IR task is complex, syntax and semantics are
essential
⇒ Great improvement in Q/A classification
SVM-Light-TK: an efficient tool to use them
![Page 44: COMPUTATIONAL ODELS FOR DATA ANALYSISagiordani/CMDA/5-Kernels-Methods.pdf · Relational Learning from Texts , Proceedings of The 24th Annual International Conference on Machine Learning](https://reader033.vdocuments.mx/reader033/viewer/2022060223/5f07ec747e708231d41f7063/html5/thumbnails/44.jpg)
SVM-light-TK SoftwareSVM-light-TK Software
Encodes ST, SST and combination kernels
in SVM-light [Joachims, 1999]
Available at http://dit.unitn.it/~moschitt/
Tree forests, vector sets
New extensions: the PT kernel will be released
asap
![Page 45: COMPUTATIONAL ODELS FOR DATA ANALYSISagiordani/CMDA/5-Kernels-Methods.pdf · Relational Learning from Texts , Proceedings of The 24th Annual International Conference on Machine Learning](https://reader033.vdocuments.mx/reader033/viewer/2022060223/5f07ec747e708231d41f7063/html5/thumbnails/45.jpg)
ReferencesReferences
Alessandro Moschitti, Silvia Quarteroni, Roberto Basili and Suresh Manandhar,
Exploiting Syntactic and Shallow Semantic Kernels for Question/Answer Classification,
Proceedings of the 45th Conference of the Association for Computational Linguistics
(ACL), Prague, June 2007.
Alessandro Moschitti and Fabio Massimo Zanzotto, Fast and Effective Kernels for
Relational Learning from Texts, Proceedings of The 24th Annual International
Conference on Machine Learning (ICML 2007), Corvallis, OR, USA.
Daniele Pighin, Alessandro Moschitti and Roberto Basili, RTV: Tree Kernels for
Thematic Role Classification, Proceedings of the 4th International Workshop on
Semantic Evaluation (SemEval-4), English Semantic Labeling, Prague, June 2007.
Stephan Bloehdorn and Alessandro Moschitti, Combined Syntactic and Semanitc
Kernels for Text Classification, to appear in the 29th European Conference on
Information Retrieval (ECIR), April 2007, Rome, Italy.
Fabio Aiolli, Giovanni Da San Martino, Alessandro Sperduti, and Alessandro Moschitti,
Efficient Kernel-based Learning for Trees, to appear in the IEEE Symposium on
Computational Intelligence and Data Mining (CIDM), Honolulu, Hawaii, 2007
![Page 46: COMPUTATIONAL ODELS FOR DATA ANALYSISagiordani/CMDA/5-Kernels-Methods.pdf · Relational Learning from Texts , Proceedings of The 24th Annual International Conference on Machine Learning](https://reader033.vdocuments.mx/reader033/viewer/2022060223/5f07ec747e708231d41f7063/html5/thumbnails/46.jpg)
An introductory book on SVMs, Kernel methods and Text CategorizationAn introductory book on SVMs, Kernel methods and Text Categorization