latent (s)svm and cognitive multiple people tracker

A Support Vector Method for Optimizing Rank-Based Performance Measures

Latent (S)SVM andCognitive Multiple People TrackerEM: the latent you knowEM is an optimization algorithm to fit a mixture of Gaussian on a set of data points.When the algorithm starts, there is no clue about which points belong to which GaussianBut the parameters of a Gaussian can be learned only by disposing of a subset of points defining it

LOOPgently guess how much each point contributes to each Gaussianuse maximum likelihood to re-estimate the optimal parameter set

(slightly more complex but thats the idea)Guess what?

True membership is latent!

Can you give a definition of latent variable?

What can we learn from EM?

It cannot be observed during training

The iterative one is still the best approach to solve latent problems

Mathematical frameworkWe still want to learn some prediction function

but together with the solution we now also have to infer the latent variable which best explains the i/o pair.Forgive us for the strong change in notation! First line argument of the argmax is the score function and second line multiplier of the parameter is the feature map.We formulate the problem as a regularized minimization of the empirical loss:

Of course the structured hinge loss will be different. Why do you think so?Mathematical frameworkThe loss is also going to incorporate the latent variables, as we jointly care to learn how to predict solutions and latent variables.

So that the structured hinge loss reduces to

Can you find the contradiction?

We said latent variables are not observed during training!Latent completionLatent completion is the crucial step designed to infer, given an input/output pair, the best latent variable which explains it!Note its different from the process we want to achieve in the prediction step where we only dispose of the input there and we want to jointly estimate the output and the latent variable.

In the EM example, if we have a set of points and a mixture of gaussians fitted on those points, latent completion would be a function capable of assigning a responsibility score to each Gaussian for the existence of each point.Summing up, we need:A new feature map able to consider latent variables tooA new loss function able to account for differences in the latent explanation as wellA new oracle call able to solve the new version of the structured hinge loss

A latent completion procedure able to provide a latent explanation given an input and its associated output

Dont be worried if this is a bit too much it may take some time to gain confidence with this stuff.Revisited version of the required functions of the SSVMRemember the association problemWhere the similarity function was parameterized and the is a parameter governing the reward from perceiving a different number of object in the scene w.r.t. the previous frame.The problem can be solved in O(n3) with the Hungarian method which also helps us define the feature map, besides the hamming loss employed is linear and thus the max oracle can be easily solved (again with the hungarian).

Ideally we could extend the similarity matrix to employ also more complex features can you see the problem?Object File TheoryOne of the first and most influential approaches to the problem of object correspondence is known as Object-File theory.

According to this theory, when an object is firstly perceived in the scene, a position marker, or spatial index, is assigned to the location occupied by that object. From then on, whenever an object is found nearby that particular location, both spatial and perceptual properties of the object are activated and become bound to the spatial index. The index become thus a pointer to the object higher level features. The central role of spatial information in the object file theory has long been known as spatiotemporal dominance and can be synthesized in the following two corollaries:object correspondence is computed on the basis of spatiotemporal continuity, andobject correspondence computation does not consult non-spatial properties of the object.

The direct consequence of those claims is that a currently viewed object is treated as corresponding to a previously viewed object if the object's position over time is consistent with the interpretation of a continuous, persisting entity.

A more subtle intuition is that if spatiotemporal information is consistent with the interpretation of a continuous object, object correspondence will be established even if surface feature and identity information are inconsistent with the interpretation of correspondence.

Example:Superman (1941) - "Up in the sky, look: It's a bird. It's a plane. It's Superman!"And computationally?Cognitive Visual TrackingBased on 3 decades of empirical resultsOur brain finds distance is the only reliable featureMotion prediction and appearance is a plus when usefulHow can we exploit humans way of coping with multiple target tracking? (we are so good at it!) Split the crowd in influence zones (latent knowledge)Decide whether those zones are ambiguous (also latent)Solve unambiguous associations with distance onlyEmploy higher level features in ambiguous cases

no one will ever say that disposing of color or motion is bad. The problem is teaching the classifier when he can trust these features!CAN WE LEARN 1-4 IN A UNIFIED FRAMEWORK?Influence zones inferenceBackgroundThey model humans visual attention beamsHelp in reducing the complexity of the task as targets appearing in different influence zones do not need to be tested for associationWe use them to localize where distance alone isnt enoughHow do we compute these influence zones?Again, its based on the Hungarian algorithm evaluating spatial information only, followed by a iterative clustering procedure.Start with Munkres solution:- if it is given then we are doing latent completion- if it is predicted we are predicting influence zones

Influence zones inference

The procedure is similar to the correlation clustering but we extended it to work with asymmetric matrices as well (H is C).Theory in practice

all the latent stuffocclusion handlingOF are updated here

find correspondence, review and impletion original meanings in the supplementary material

and back to latent SSVMAs always we need to define:a feature map! (always start from the prediction function if you can)a loss function (super easy)a max oracle (try to reduce it to a modified prediction step)AND a latent completion step (already done!)

Instead of starting with Munkres solution, initialize the algorithm with

Feature Map

Loss function and Max Oracle

What about FW?