learning with augmented features for heterogeneous domain ... · learning with augmented features...
TRANSCRIPT
Learning with Augmented Features for
Heterogeneous Domain Adaptation
Lixin Duan, Dong Xu, Ivor Tsang
Nanyang Technological University, Singapore
Outline
• Background
• Heterogeneous Feature Augmentation (HFA)
– Learning Augmented Feature Representations
– Formulation with Support Vector Machine
– Kernelized HFA
• Experiments
– Object Recognition
– Multilingual Text Classification
• Conclusion
Background
• Domain Adaptation
– To retain and apply previous knowledge learned from
some existing domain (a.k.a., the source domain) to
improve learning in the new domain (a.k.a., the target
domain).
Traditional Machine Learning
Training and test data are
from the same domain
Sample in domain A
Domain Adaptation
Training and test data are
from different domains
Sample in domain B
Background
• Heterogeneous Domain Adaptation
– Source and target data have different feature dimensions
• English vs. Chinese
• Text vs. Images
• SIFT features vs. SURF features
• others
?
Heterogeneous Feature Augmentation
• Mapping heterogeneous features into a common
subspace
– �� ∈ ℝ�� and �� ∈ ℝ��, � ≠ �• Projection � ∈ ℝ��×��: �� → � ⋅ ��• Projection � ∈ ℝ��×��: �� → � ⋅ ��
�� ��� �Common subspace ℝ��
Heterogeneous Feature Augmentation
• Feature Augmentation
– New feature mapping �� �� and �� ��– Data with heterogeneous features can be compared in the
common subspace
– Similarities between data in the same domain can be
enhanced by incorporating the original features [Daumé III, 2007]
�� �� = ��������and �� �� = ��������
�� �� , �� = � 2 ⋅ � �� , �� , if�� and��arefromthesamedomain� �� , �� , if�� and��arefromdifferentdomains
Linear HFA
• HFA Formulation with SVM
– * = *+, , *�, , *�, ′ is a feature weight vector
– .��s and .��s are slack variables
– /0 and /1 are pre-defined parameters
min�,�,*,2,34�,34� 12 * 6 + 8 9 .��:��;< +9 .��:�
�;<s. t. >�� *,�� ��� + ? ≥ 1 − .�� , .��≥ 0;
>�� *,�� ��� + ? ≥ 1 − .�� , .��≥ 0;� D6 ≤ /0, � D6 ≤ /1
Linear HFA
• Taking the dual w.r.t. *, ?, .�� and .�� :
– F : a vector of dual variables; G : a vector of training labels
– H�,� = I�, J:� + �,� I� I�,�,�I�I�,�,�I� I�, K:� + �,� I�– It is nontrivial to determine the optimal dimension + for
the common subspace
min�,� maxF M:�N:�, F − 12 F ∘ G ,H�,� F ∘ Gs. t. G,F = 0, �:�N:� ≤ F ≤ 8M:�N:�;� D6 ≤ /0, � D6 ≤ /1
Linear HFA
• Transformation Metric
– Get rid of +• Linear Formulation with SVM
– HP = I�,I� + Q�, PQ� Q�, PQ�Q�,PQ� I�,I� + Q�,PQ�– Q� = J��R��×�� I� , Q� = R��×��J�� I� , / = /0 + /1
P = �,� , �,� ∈ ℝ(��N��)×(��N��)
minP≽� maxF M:�N:�, F − 12 F ∘ G ,HP F ∘ Gs. t. G,F = 0, �:�N:� ≤ F ≤ 8M:�N:�; trace(P) ≤ /
Linear HFA
• Solution: Iteratively update the transformation metric P using SDP and the dual variable F using SVM
• Issues in Linear HFA
– P is linear, which may not be effective for some tasks
– Infeasible for applications with very high dimensional data
• Can we learn P with its size dependent on the
number of training samples?
Kernelization
Kernelized HFA
• Notations
– Nonlinear mapping function W: � → W(�)– Kernel function X �� , �� = W, �� W ��– Kernel matrices H� = Y�,Y� and H� = Y�,Y�– Projection matrices �Z and �Z– Nonlinear feature transformation P�
Kernelized HFA
• Learn �� ∈ ℝ��×:� and �� ∈ ℝ��×:� instead of �Z and �Z
• Nonlinear Transformation MetricP� = ��,�� , ��,�� ∈ ℝ(:�N:�)×(:�N:�)
Kernelized HFA
• Kernelized Formulation with SVM
– HP� = H� + Q[ �, P�Q[ � Q[ �, P�Q[ �Q[ �,P�Q[ � H� + Q[ �,P�Q[ �– Q[ � = J:�R:�×:� H�</6, Q[ � = R:�×:�J:� H�</6
• Solution: Iteratively update the transformation
metric P and the dual variable F
minP�≽� maxF M:�N:�, F − 12 F ∘ G ,HP� F ∘ Gs. t. G,F = 0, �:�N:� ≤ F ≤ 8M:�N:�; trace(P�) ≤ /
Related Work
• ARC-t [Kulis et al., 2011]
– It learns an asymmetric transformation metric between
the different feature spaces.
• DAMA [Wang et al., 2011]
– It learns a common feature subspace by utilizing the class
labels of the source and target training data for manifold
alignment.
• HeMap [Shi et al., 2010]
– Unsupervised. It finds the projection matrices for a
common feature subspace as well as learns the optimal
projected data from both domains.
Experiments
• Object Dataset [Kulis et al., 2011]
– 4,106 images with 31 categories from three sources
– Source: amazon or webcam; Target: dslr
– Test data: The remaining dslr images are used
– Classification accuracy
]88 = #correctlyclassifiedsamples#totaltestsamples
Experiments
• Reuters Multilingual Dataset [Amini et al., 2009]
– 11,547 documents with 6 classes from 5 sources
– Source: English, French, German or Italian; Target: Spanish
– Test data: The remaining Spanish documents are used
– Classification accuracy
]88 = #correctlyclassifiedsamples#totaltestsamples
Experiments
• Results– Object dataset [Kulis et al., 2011]
– Reuters multilingual dataset [Amini et al., 2009]
– Our HFA method is significantly better than the other
methods under both settings, judged by the t-test with a
significance level at 0.05
Experiments
• Results w.r.t. # Target Training Samples Per Class– Reuters multilingual dataset [Amini et al., 2009]
• Convergence Analysis− “back_pack” on the object
dataset
− “C15” on the Retuers
multilingual dataset
− Less than 80 and 40 iter. on
both datasets
References
Amini, M., Usunier, N., and Goutte, C. Learning from multiple
partially observed views – an application to multilingual text
categorization. In NIPS, 2009.
Daumé III, H. Frustratingly easy domain adaptation. In ACL, 2007.
Kulis, B., Saenko, K., and Darrell, T. What you saw is not what you
get: Domain adaptation using asymmetric kernel transforms. In
CVPR, 2011.
Shi, X., Liu, Q., Fan, W., Yu, P. S., and Zhu, R. Transfer learning on
heterogenous feature spaces via spectral transformation. In
ICDM, 2010.
Wang, C. and Mahadevan, S. Heterogeneous domain adaptation
using manifold alignment. In IJCAI, 2011.
Thank you!
Q & A