fcv rep darrell
TRANSCRIPT
![Page 1: Fcv rep darrell](https://reader033.vdocuments.mx/reader033/viewer/2022052212/55948a481a28ab177d8b476b/html5/thumbnails/1.jpg)
Learning visual representations
for unfamiliar environments
Kate Saenko, Brian Kulis,
Trevor Darrell
UC Berkeley EECS & ICSI
![Page 2: Fcv rep darrell](https://reader033.vdocuments.mx/reader033/viewer/2022052212/55948a481a28ab177d8b476b/html5/thumbnails/2.jpg)
The challenge of large scale visual interaction
Last decade has proven the superiority of models learned from data vs. hand engineered structures!
![Page 3: Fcv rep darrell](https://reader033.vdocuments.mx/reader033/viewer/2022052212/55948a481a28ab177d8b476b/html5/thumbnails/3.jpg)
• “Unsupervised”: Learn models from “found data”;
often exploit multiple modalities (text+image)
Large-scale learning
… The Tote is the perfect example of
two handbag design principles that ...
The lines of this tote are incredibly
sleek, but ... The semi buckles that form the handle attachments are ...
![Page 4: Fcv rep darrell](https://reader033.vdocuments.mx/reader033/viewer/2022052212/55948a481a28ab177d8b476b/html5/thumbnails/4.jpg)
E.g., finding visual senses
4
Artifact sense: “telephone” DICTIONARY
1: (n)
telephone, phone, telepho
ne set (electronic
equipment that converts
sound into electrical
signals that can be
transmitted over distances
and then converts received
signals back into sounds)
2: (n)
telephone, telephony
(transmitting speech at a
distance)
[Saenko and Darrell ’09]
![Page 5: Fcv rep darrell](https://reader033.vdocuments.mx/reader033/viewer/2022052212/55948a481a28ab177d8b476b/html5/thumbnails/5.jpg)
• “Unsupervised”: Learn models from “found data”;
often exploit multiple modalities (text+image)
• Supervised: Crowdsource labels (e.g., ImageNet)
Large-scale Learning
… The Tote is the perfect example of
two handbag design principles that ...
The lines of this tote are incredibly
sleek, but ... The semi buckles that form the handle attachments are ...
![Page 6: Fcv rep darrell](https://reader033.vdocuments.mx/reader033/viewer/2022052212/55948a481a28ab177d8b476b/html5/thumbnails/6.jpg)
Yet…
• Even the best collection of images from the web and
strong machine learning methods can often yield poor
classifiers on in-situ data!
• Supervised learning assumption: training distribution
== test distribution
• Unsupervised learning assumption: joint distribution is
stationary w.r.t. online world and real world
Almost never true! 6
?
![Page 7: Fcv rep darrell](https://reader033.vdocuments.mx/reader033/viewer/2022052212/55948a481a28ab177d8b476b/html5/thumbnails/7.jpg)
“What You Saw Is Not What You Get”
The models fail due to domain shift
SVM:54%NBNN:61%
SVM:20%NBNN:19%
![Page 8: Fcv rep darrell](https://reader033.vdocuments.mx/reader033/viewer/2022052212/55948a481a28ab177d8b476b/html5/thumbnails/8.jpg)
Close-up Far-away
amazon.comConsumer images
FLICKR CCTV
Examples of visual domain shifts
digital SLR webcam
![Page 9: Fcv rep darrell](https://reader033.vdocuments.mx/reader033/viewer/2022052212/55948a481a28ab177d8b476b/html5/thumbnails/9.jpg)
Examples of domain shift:
change in camera, feature type, dimension
digital SLR webcam
SURF
VQ to 300
SIFT
VQ to 1000
Different dimensions
![Page 10: Fcv rep darrell](https://reader033.vdocuments.mx/reader033/viewer/2022052212/55948a481a28ab177d8b476b/html5/thumbnails/10.jpg)
Solutions?
• Do nothing (poor performance)
• Collect all types of data (impossible)
• Find out what changed (impractical)
• Learn what changed
![Page 11: Fcv rep darrell](https://reader033.vdocuments.mx/reader033/viewer/2022052212/55948a481a28ab177d8b476b/html5/thumbnails/11.jpg)
Prior Work on Domain Adaptation
• Pre-process the data [Daumé ’07] : replicate
features to also create source- and domain-
specific versions; re-train learner on new features
• SVM-based methods [Yang’07], [Jiang’08],
[Duan’09], [Duan’10] : adapt SVM parameters
• Kernel mean matching [Gretton’09] : re-weight
training data to match test data distribution
![Page 12: Fcv rep darrell](https://reader033.vdocuments.mx/reader033/viewer/2022052212/55948a481a28ab177d8b476b/html5/thumbnails/12.jpg)
Our paradigm: Transform-based
Domain Adaptation
Previous methods’ drawbacks
• cannot transfer learned shift
to new categories
• cannot handle new features
We can do both by learning
domain transformations*
Example: “green” and “blue” domains
W
* Saenko, Kulis, Fritz, and Darrell. Adapting visual category models to new domains. ECCV, 2010
![Page 13: Fcv rep darrell](https://reader033.vdocuments.mx/reader033/viewer/2022052212/55948a481a28ab177d8b476b/html5/thumbnails/13.jpg)
Symmetric assumption fails!
Limitations of symmetric transforms
Saenko et al. ECCV10 used
metric learning:
• symmetric transforms
• same features
How do we learn more
general shifts?
W
![Page 14: Fcv rep darrell](https://reader033.vdocuments.mx/reader033/viewer/2022052212/55948a481a28ab177d8b476b/html5/thumbnails/14.jpg)
Asymmetric transform (rotation)
Latest approach*: asymmetric transforms
• Metric learning model no
longer applicable
• We propose to learn
asymmetric transforms
– Map from target to source
– Handle different dimensions
*Kulis, Saenko, and Darrell, What You Saw is Not What You Get: Domain Adaptation Using Asymmetric Kernel Transforms, CVPR 2011
![Page 15: Fcv rep darrell](https://reader033.vdocuments.mx/reader033/viewer/2022052212/55948a481a28ab177d8b476b/html5/thumbnails/15.jpg)
Asymmetric transform (rotation)
W
Latest approach: asymmetric transforms
• Metric learning model no
longer applicable
• We propose to learn
asymmetric transforms
– Map from target to source
– Handle different dimensions
![Page 16: Fcv rep darrell](https://reader033.vdocuments.mx/reader033/viewer/2022052212/55948a481a28ab177d8b476b/html5/thumbnails/16.jpg)
Model Details
• Learn a linear transformation to map points
from one domain to another
– Call this transformation W
– Matrices of source and target:
W
![Page 17: Fcv rep darrell](https://reader033.vdocuments.mx/reader033/viewer/2022052212/55948a481a28ab177d8b476b/html5/thumbnails/17.jpg)
Loss Functions
Choose a point x from the source and y from the target, and consider inner product:
Should be “large” for similar objects and “small” for dissimilar objects
![Page 18: Fcv rep darrell](https://reader033.vdocuments.mx/reader033/viewer/2022052212/55948a481a28ab177d8b476b/html5/thumbnails/18.jpg)
Loss Functions
• Input to problem includes a collection of m
loss functions
• General assumption: loss functions depend
on data only through inner product matrix
![Page 19: Fcv rep darrell](https://reader033.vdocuments.mx/reader033/viewer/2022052212/55948a481a28ab177d8b476b/html5/thumbnails/19.jpg)
Regularized Objective Function
• Minimize a linear combination of sum of loss
functions and a regularizer:
• We use squared Frobenius norm as a
regularizer
– Not restricted to this choice
![Page 20: Fcv rep darrell](https://reader033.vdocuments.mx/reader033/viewer/2022052212/55948a481a28ab177d8b476b/html5/thumbnails/20.jpg)
The Model Has Drawbacks
• A linear transformation may be insufficient
• Cost of optimization grows as the product of
the dimensionalities of the source and target
data
• What to do?
![Page 21: Fcv rep darrell](https://reader033.vdocuments.mx/reader033/viewer/2022052212/55948a481a28ab177d8b476b/html5/thumbnails/21.jpg)
Kernelization
• Main idea: run in kernel space
– Use a non-linear kernel function (e.g., RBF kernel)
to learn non-linear transformations in input space
– Resulting optimization is independent of input
dimensionality
– Additional assumption necessary: regularizer is a
spectral function
![Page 22: Fcv rep darrell](https://reader033.vdocuments.mx/reader033/viewer/2022052212/55948a481a28ab177d8b476b/html5/thumbnails/22.jpg)
Kernelization
Original Transformation Learning Problem
Kernel matrices for source and target
New Kernel Problem
Relationship between original and new problems at optimality
![Page 23: Fcv rep darrell](https://reader033.vdocuments.mx/reader033/viewer/2022052212/55948a481a28ab177d8b476b/html5/thumbnails/23.jpg)
Summary of approach
Input
space
Input
space
1. Multi-Domain Data 2. Generate Constraints, Learn W
3. Map via W 4. Apply to New Categories
Test point
Test pointy1 y2
![Page 24: Fcv rep darrell](https://reader033.vdocuments.mx/reader033/viewer/2022052212/55948a481a28ab177d8b476b/html5/thumbnails/24.jpg)
Multi-domain dataset
![Page 25: Fcv rep darrell](https://reader033.vdocuments.mx/reader033/viewer/2022052212/55948a481a28ab177d8b476b/html5/thumbnails/25.jpg)
Experimental Setup
• Utilized a standard bag-of-words model
• Also utilize different features in the target domain
– SURF vs SIFT
– Different visual word dictionaries
• Baseline for comparing such data: KCCA
![Page 26: Fcv rep darrell](https://reader033.vdocuments.mx/reader033/viewer/2022052212/55948a481a28ab177d8b476b/html5/thumbnails/26.jpg)
Novel-class experiments
• Test method’s ability to transfer domain shift to unseen
classes
• Train transform on half of the classes, test on the other half
Our Method (linear)Our Method
![Page 27: Fcv rep darrell](https://reader033.vdocuments.mx/reader033/viewer/2022052212/55948a481a28ab177d8b476b/html5/thumbnails/27.jpg)
Extreme shift example
Nearest neighbors in source using transformation
Query from target Nearest neighbors in source using KCCA+KNN
![Page 28: Fcv rep darrell](https://reader033.vdocuments.mx/reader033/viewer/2022052212/55948a481a28ab177d8b476b/html5/thumbnails/28.jpg)
Conclusion
• Should not rely on hand-engineered features any more than we rely on hand engineered models!
• Learn feature transformation across domains
• Developed a domain adaptation method based on regularized non-linear transforms
– Asymmetric transform achieves best results on more extreme shifts
– Saenko et al ECCV 2010 and Kulis et al CVPR 2011; journal version forthcoming