spring 2014, columbia university...
TRANSCRIPT
EECS 6890 – Topics in Information ProcessingSpring 2014, Columbia University
http://rogerioferis.com/VisualRecognitionAndSearch2014
Jun Wang, Jan 30
Visual Recognition and Search
Visual Recognition And Search Columbia University, Spring 20142
Brief Introduction
• About Me
– PhD from the EE Dept., Columbia Univ., 2011
“Semi-Supervised Learning for Scalable and Robust Visual Search”
– Research Staff Member (2010 - Present)
Business Analytics and Mathematical Sciences,
IBM T. J Watson Research
• About You - Background
– Machine learning
– Linear Algebra
– Optimization
– Probability and Statistics
Visual Recognition And Search Columbia University, Spring 20143
Lecture 2: Machine Learning Fundamentals
• Definition
– a branch of artificial intelligence, concerns the construction
and study of systems that can learn from data - wiki
• Related Columbia Courses
– Machine Learning COMS 4771
http://www.cs.columbia.edu/~jebara/4771/
• Book
C. M. Bishop, Pattern Recognition and
Machine Learning, Springer, 2006
Visual Recognition And Search Columbia University, Spring 20144
Overview
• Machine learning and data mining
• Representative machine learning problems
– Classification, clustering analysis, regressions,
dimensionality reduction, metric learning, feature learning,
matrix completion, graph learning, ensemble learning,
kernel learning
• Major learning paradigms
– Supervised learning
– Unsupervised learning
– Semi-supervised learning
Visual Recognition And Search Columbia University, Spring 20145
Outline
• Regression and Classification
• Clustering
• Semi-supervised learning
• Dimensionality Reduction
• Metric Learning
Visual Recognition And Search Columbia University, Spring 20146
Linear Regression
• Linear Regression
– Training data
– Linear model
• Least Square
– Squared error
– Optimal solution
Demo!
Visual Recognition And Search Columbia University, Spring 20147
Logistic Regression
• Background
– 1936: Fisher method (linear discriminant analysis)
– 1940s: logistic regression
• Settings
– Input/observation: continuous variables
– Output/response: binary predictor
• Example
– X = [0.0 0.2 0.7 1.0 1.1 1.4 1.5 1.7 2.1 2.5]';
– Y= [0 0 0 0 0 1 1 1 1 1]';
Visual Recognition And Search Columbia University, Spring 20148
Logistic Regression
• Logistic Sigmoid Function
– S-curve (0,1)
– Derivative
– Regression function (generalized linear models )
• Maximum likelihood estimation
– Logistic loss
– Iterative process to estimate the parametersDemo!
Visual Recognition And Search Columbia University, Spring 20149
Linear Classification
• Linear Classifier
– Training data
– Linear classification function
• Hinge Loss
– maximum-margin classification
– Classification score
Visual Recognition And Search Columbia University, Spring 201410
Support Vector Machine (SVM)
• Definitions
– Classification hyperplane
– Positive margin hyperplane
– Negative margin hyperplane
– Margin between and
margin
Visual Recognition And Search Columbia University, Spring 201411
SVM Objective: Maximum-Margin
• Equals to minimizing
• Recall we have the training data
• Recall hinge loss
• Final objective
• Quadratic programming (quadprog function in Matlab)
margin
Visual Recognition And Search Columbia University, Spring 201412
• Primal problem
• Lagrange method
• SVM dual problem
SVM: Sketch Derivation of Dual Form
Visual Recognition And Search Columbia University, Spring 201413
• SVM linear classifier
is learned through solving the following optimization
problem
• SVM dual form: learn a linear classifier
by solving an optimization problem over
SVM: Primal and Dual Problems
Visual Recognition And Search Columbia University, Spring 201414
• Primal problem: solving a variable
• Dual problem: solving a variable
• The learned variable is often sparse with few non-zero
elements
• Non-zero elements correspond to
support vectors
• Sparse solution gives efficient
classification process
• Sparse solution also indicates
better generalization
SVM: Primal and Dual Problems
Visual Recognition And Search Columbia University, Spring 201415
Non-Separable SVM
• The above SVM solves separable cases
• data are often not linearly separable
• Relax hard constrains with slack
variables
• Penalize slack
Visual Recognition And Search Columbia University, Spring 201416
Nonlinear SVM: Kernelization
• Data are often not linearly separable
• The power of kernerlization
– Mapping the data to higher-dimensional
space
– Quadratic polynomial
• Learn a linear classifier with feature
map
hyperplane
Visual Recognition And Search Columbia University, Spring 201417
Nonlinear SVM: Dual Form
• Recall the dual form
• Nonlinear SVM with feature map
• Nonlinear SVM dual problem
• Observation: inner product of feature map
Visual Recognition And Search Columbia University, Spring 201418
Nonlinear SVM: Kernel Trick
• Quadratic polynomial
• Kernel trick
– Do not need to calculate Kernel map explicitly
– Explicitly calculating kernel map is not feasible
Visual Recognition And Search Columbia University, Spring 201419
Nonlinear SVM: Exemplar Kernels
• Linear kernel
• Polynomial kernel
– All polynomial terms up to degree
• Gaussian kernel (Radial Basis Function)
– Infinite dimensional feature map
Visual Recognition And Search Columbia University, Spring 201420
SVM with RBF Kernel
• Classification function
• RBF SVM
Visual Recognition And Search Columbia University, Spring 201421
SVM: Resources
• SVM video demo
http://www.youtube.com/watch?v=3liCbRZPrZA
• Steve Gunn’s SVM package
http://www.isis.ecs.soton.ac.uk/resources/svminfo/
• LibSVM
a comprehensive SVM package
http://www.csie.ntu.edu.tw/~cjlin/libsvm/
Visual Recognition And Search Columbia University, Spring 201422
Summary: Loss Functions
• Quadratic loss
• Hinge loss
• 0-1 loss
logic indicator (1 if true, 0 if false)
Visual Recognition And Search Columbia University, Spring 201423
Outline
• Regression and Classification
• Clustering
• Semi-supervised learning
• Dimensionality Reduction
• Metric Learning
Visual Recognition And Search Columbia University, Spring 201424
Clustering – Unsupervised Learning
• Definition - wiki– clustering is the task of grouping a set of objects in such a way that
objects in the same group (called a cluster) are more similar to each
other than to those in other groups
• A popular tool for exploratory data mining in various
applications– Client/customer grouping for better marketing
– User/product grouping for better recommendation
– Patient population grouping for improving healthcare service delivery
– Social analytics: crime, education
– Science: genotype assignment, chemical compound grouping,
climatology
– …
Visual Recognition And Search Columbia University, Spring 201425
K-Means Clustering
• A well-known and simple method for clustering data.
• An iterative process
�a) Estimating cluster centers (location of clusters)
�b) Calculating data points’ cluster membership
�c) Repeat a) & b) until nothing changes
�Matlab: IDX = kmeans(X,k)
http://en.wikipedia.org/wiki/K-means_clustering
Visual Recognition And Search Columbia University, Spring 201426
K-Means Application: Bag-of-Visual-Words
• Employ K-means to extract visual key words
From Y.-G. Jiang’s slides
Visual Recognition And Search Columbia University, Spring 201427
Hierarchical Clustering
• Clustering data points and building a hierarchy of
clusters
�Rely on distance/similarity measure
�Agglomerative: bottom up approach
�Divisive: top down approach
• Example
Visual Recognition And Search Columbia University, Spring 201428
Maximum Likelihood Estimation
• Given data, estimate the underlying distributions
• Gaussian distribution
• Parameter estimation
Visual Recognition And Search Columbia University, Spring 201429
• Data are generated from a series of Gaussian models
• Expectation Maximization (EM)
�E-step
�M-step
Mixture of Gaussians
grouping families
hidden variable
Visual Recognition And Search Columbia University, Spring 201430
• Beyond linearly separable
• Graphs
• Similarity Matrix
Clustering on Nonlinear Data Manifold
Question?
Visual Recognition And Search Columbia University, Spring 201431
• Transform data to a similarity graph
�Graph node is a data point
�Graph edge measures pair-wise similarity
• Clustering can be viewed as
partitioning the similarity graph
Graph Partition
A
B
degree matrix
code: http://www.cis.upenn.edu/~jshi/software/
Visual Recognition And Search Columbia University, Spring 201432
• Spectral Graph Theory
�Graph Laplacian
�The eigenvalues and eigenvectors of the graph Laplacian
provide the structure and connectivity information of the
graph
• Algorithm sketch
�Graph construction to receive a similarity matrix
�Compute eigenvalues and eigenvectors of the Laplacian matrix
�Perform K-means clustering using the following as data
Spectral Clustering
Visual Recognition And Search Columbia University, Spring 201433
Break
Visual Recognition And Search Columbia University, Spring 201434
Outline
• Regression and Classification
• Clustering
• Semi-supervised learning
• Dimensionality Reduction
• Metric Learning
Visual Recognition And Search Columbia University, Spring 201435
• Motivation
�Data is rich
�Labels are expensive to receive
�Can unlabeled data help classification?
• Key Assumptions of SSL
�Smoothness: yields a preference for
decision boundaries in low-density regions
�Clustering/manifold assumption: data tend to form discrete
clusters or lie in low-dimensional manifold
Semi-Supervised Learning (SSL) Overview
Visual Recognition And Search Columbia University, Spring 201436
• Recall standard SVM
�Training data
�Learn a linear classifier
�SVM primal
• Transductive SVM
�Training data
SVM with Unlabeled Data
Visual Recognition And Search Columbia University, Spring 201437
• Label propagation with graphs
Graph-Based SSL: Graph Propagation
Input samples with sparse labels Label propagation on a graph Label inference results
Unlabeled Positive Negative NegativePositive
Visual Recognition And Search Columbia University, Spring 201438
Graph Propagation: Notation and Example
11
22
1
Weight matrix
Node degree matrixLabel matrix
a fraction of constructed graph
classes
sam
ple
s
Visual Recognition And Search Columbia University, Spring 201439
• Graph Laplacian
• Normalized graph Laplacian
• An operator measuring the smoothness of a function
over the graph - Chung, spectral graph theory, 1997
Regularization with Graph Laplacian
Visual Recognition And Search Columbia University, Spring 201440
• Prediction function estimation through optimizing a cost
function
• Two representative methods
� Gaussian Fields and Harmonic functions (Zhu et al., ICML03)
� Local and global consistency - LGC (Zhou et al., NIPS 04)
SSL with Graph Regularization
prediction function function smoothness Empirical loss
Visual Recognition And Search Columbia University, Spring 201441
• Graph is more accurate for relevance ranking
Graph Based Ranking
Ranking by geodesic distance
Zhou, Weston, et al., NIPS 04
Visual Recognition And Search Columbia University, Spring 201442
• Application: Interactive Visual Search
Visual Search using Graph Propagation
Wang et al., ICML08
Visual Recognition And Search Columbia University, Spring 201443
Outline
• Regression and Classification
• Clustering
• Semi-supervised learning
• Dimensionality Reduction
• Metric Learning
Visual Recognition And Search Columbia University, Spring 201444
Dimensionality Reduction - Embedding
• Objective: reduce the number of random variables
� Input
� Output
� Principle: minimize reconstruction error, preserving locality
• Two general categories of methods
� Linear DM: PCA&LDA
� Nonlinear DM: LLE, ISOMAP
• Three learning paradigms
� Unsupervised: PCA
� Supervised: LDA
� Semi-Supervised
Visual Recognition And Search Columbia University, Spring 201445
Locally Linear Embedding
• Objective: Preserve local linear structure
• Algorithm
� Step 1: find the k nearest neighbor for each data point
� Step 2: find weight matrix with minimum reconstruction error
� Step 3: find embedding with minimum reconstruction error
http://www.cs.nyu.edu/~roweis/lle/
Visual Recognition And Search Columbia University, Spring 201446
ISOMAP
• Objective: finds the projection that preserves the
global nonlinear geometry of the data
� Calculate the geodesic manifold interpoint distances
� Perform multidimensional scaling to derive projections to
preserve the geodesic distance
http://isomap.stanford.edu/
Visual Recognition And Search Columbia University, Spring 201447
Outline
• Regression and Classification
• Clustering
• Semi-supervised learning
• Dimensionality Reduction
• Metric Learning
Visual Recognition And Search Columbia University, Spring 201448
Distance Metric Learning
• Motivation
� Semantic gap: semantic description often differs from the
feature representation
• Applications
� Nearest neighbor search
� Clustering (K-Means)
� Graph learning
� Classification (SVM)similardissimilar
Visual Recognition And Search Columbia University, Spring 201449
Mahalanobis Distances
• Squared Euclidean distance
• Mahalanobis distances
• Many metric learning approaches use the above
form.
Visual Recognition And Search Columbia University, Spring 201450
Metric Learning and Linear Projection
• Cholesky decomposition
• Rewtite Mahalanobis distances as
• Mahalanobis distances can be viewed as squared
Euclidian distance after linear projection
Visual Recognition And Search Columbia University, Spring 201451
Large Margin Metric Learning
• Problem setting and formulation
� Given similar sample pairs
� Minimize distance between similar sample pairs
� Satisfy relative distance constraints
• Objective function
Visual Recognition And Search Columbia University, Spring 201452
Large Margin Metric Learning
• LMML objective
• Formulation with slack variables
Visual Recognition And Search Columbia University, Spring 201453
Announcement
• Please form groups of two students and send us via
email (one email per group):
� The name of the members in the group and
� Three preferred presentation topics as soon as possible (no
later than Feb 06).
� Each group will have to prepare presentations in class and
work together in a project.
• List of paper presentation topics and information
about length of presentations have been posted
(check the presentations page)
• Required reading for next class (check the schedule
page)
Visual Recognition And Search Columbia University, Spring 201454