toward learning mixture-of-parts pictorial structures

37
Toward Learning Mixture-of- Toward Learning Mixture-of- Parts Pictorial Structures Parts Pictorial Structures Robin Hess and Alan Fern School of Electrical Engineering and Computer Science Oregon State University

Upload: allen-perez

Post on 03-Jan-2016

17 views

Category:

Documents


0 download

DESCRIPTION

Robin Hess and Alan Fern. Toward Learning Mixture-of-Parts Pictorial Structures. School of Electrical Engineering and Computer Science Oregon State University. Talk Objectives. Overview OSU Digital Scout Project Describe problem of initial formation labeling - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Toward Learning Mixture-of-Parts Pictorial Structures

Toward Learning Mixture-of-Parts Toward Learning Mixture-of-Parts

Pictorial StructuresPictorial Structures

Robin Hess and Alan Fern

School of Electrical Engineering and Computer Science

Oregon State University

Page 2: Toward Learning Mixture-of-Parts Pictorial Structures

Alan Fern Oregon State University

Overview OSU Digital Scout Project

Describe problem of initial formation labeling Representational and inference challenges

Mixture-of-Parts Pictorial Structures Model definition Inference

Opportunities for learning Parameters and structure Speedup Learning Active Learning Transfer Learning

Talk Objectives

Page 3: Toward Learning Mixture-of-Parts Pictorial Structures

Alan Fern Oregon State University

The OSU Digital Scout ProjectObjective: compute semantic interpretations of football video

Raw video High-level interpretation of play

Professional/college teams spend many hours attaching semantic tags to video for DB access We want to make this process much more automatic

Support computer assisted strategic analysis of opponents

Previous Work: S. Intille. Visual Recognition of Multi-Agent Action. PhD Thesis, MIT, 1999.

Page 4: Toward Learning Mixture-of-Parts Pictorial Structures

Alan Fern Oregon State University

Obtained several games worth of home field video from OSU football team Once video file per play Exact same video used by coaches Video shot by single fixed location at top of Reser stadium Camera is constantly panning and zooming

Raw Video Data

Page 5: Toward Learning Mixture-of-Parts Pictorial Structures

Alan Fern Oregon State University

Registered Video Data Semantic interpretation requires registration of video

data to football field coordinates Developed robust registration approach [Hess & Fern, CVPR’07]

planar homography

Page 6: Toward Learning Mixture-of-Parts Pictorial Structures

Alan Fern Oregon State University

Problem: Formation Labelling We consider a subproblem of full play interpretation

Given: initial registered video frame of a play Output: offensive formation

types and locations of 11 offensive players

Thousands of possible formations

player locations & types

Page 7: Toward Learning Mixture-of-Parts Pictorial Structures

Alan Fern Oregon State University

Challenges in Formation Labelling Player appearances nearly identical

Appearance not useful for inferring player type

Difficult to robustly segment individual players “part detector” style approaches are difficult to apply

Page 8: Toward Learning Mixture-of-Parts Pictorial Structures

Alan Fern Oregon State University

Challenges in Formation LabellingDifferent formations can differ in subtle ways

Page 9: Toward Learning Mixture-of-Parts Pictorial Structures

Alan Fern Oregon State University

Problem Constraints A number of hard constraints imposed by rule book

Exactly 11 players Exactly 7 players on line and 4 players behind line Exactly 1 quarterback and 1 center Location of center is at midfield or “hash line”

Page 10: Toward Learning Mixture-of-Parts Pictorial Structures

Alan Fern Oregon State University

Problem Constraints Soft constraints on relative spatial locations of

players Constraints strongly depend on the set of player types

Page 11: Toward Learning Mixture-of-Parts Pictorial Structures

Alan Fern Oregon State University

Previous Attempt

Intille used KB of hard constraints to cast as a SAT-like problem Constraints: “near”, “to the left of”, “bit of vertical

space between”, etc.

Simplified problem by hand-labelling the field locations of the 11 players Only tried to infer player types

Failed to get the approach to work well and was abandoned in previous work

S. Intille. Visual Recognition of Multi-Agent Action. PhD Thesis, MIT, 1999.

Page 12: Toward Learning Mixture-of-Parts Pictorial Structures

Alan Fern Oregon State University

Structured Output Representations

Infer type & location for all of 11 players ti {QBS, QB, C, LG, RG, LTE, . . . }, 34 types

li {(0,0),(0,1),…, (n,m)}, pixel location

Our representation must capture Hard joint constraints among types Soft joint constraints among locations

conditioned on types and image data 22 output variables

Possible to encode constraints via standard discrete factor-graph models (e.g. CRFs, weighted CSPs, ILP, etc.)

Such encodings appear problematic wrt off-the-shelf inference techiques (?)

Domains of variables are huge many valuesLarge factors (e.g. exactly 7 “line type” players)Location constraints are inherently numeric

Page 13: Toward Learning Mixture-of-Parts Pictorial Structures

Alan Fern Oregon State University

Pictorial Structures Offensive formations can be viewed as multi-part

articulated objects (parts correspond to players)

Pictorial structure models have been successful for multi-part objects in computer vision Local part appearance models Deformable connections Joint estimation of part locations

Courtesy Fischler & Elschlager

simply pairwisegraphical models

node values are part locations

Page 14: Toward Learning Mixture-of-Parts Pictorial Structures

Alan Fern Oregon State University

Page 15: Toward Learning Mixture-of-Parts Pictorial Structures

Alan Fern Oregon State University

When edge structure forms a tree can use DP to compute map in O(nh2) time n - # of parts, h - # of pixels h2 is often impractical

If in addition dij(. , .) is a Mahalanobis distance then can do computation in O(nh) time!

Page 16: Toward Learning Mixture-of-Parts Pictorial Structures

Alan Fern Oregon State University

Pictorial Structures for Football For a fixed set of player types, locations can be well

approximated by pictorial structure

But part sets (i.e. player types) varies across plays Can’t use standard pictorial structures for our problem

Can we still leverage benefits of pictorial structures?

Page 17: Toward Learning Mixture-of-Parts Pictorial Structures

Alan Fern Oregon State University

Mixture of Parts Pictorial Structures (MoPPS)

Captures constraints on legal part sets via pv

Captures spatial constraints among parts via f

Page 18: Toward Learning Mixture-of-Parts Pictorial Structures

Alan Fern Oregon State University

MoPPS Inference

Find MAP estimate of most likely set of parts and their locations:

Worst case: evaluate pictorial structure of each legal part set Requires over an hour of processing for our problem

Need a structured MoPPS representation that can be exploited for fast inference We use a “MoPPS Tree”

Page 19: Toward Learning Mixture-of-Parts Pictorial Structures

Alan Fern Oregon State University

MoPPS Tree Representation

Pictorial structure for a legal part set is projection of global tree onto part set

Page 20: Toward Learning Mixture-of-Parts Pictorial Structures

Alan Fern Oregon State University

MoPPS Tree for Football

34 parts in model (one for each possible player type)

Includes local observation models

Includes pairwise spatial constraints

Also provide constraints for evaluating legal part sets

Page 21: Toward Learning Mixture-of-Parts Pictorial Structures

Alan Fern Oregon State University

MoPPS Tree Inference

Becomes combinatorial optimization over legal part sets

We use Branch-and-Bound Search (BBS)

Page 22: Toward Learning Mixture-of-Parts Pictorial Structures

Alan Fern Oregon State University

Branch-and-Bound Search

Search nodes are part sets Internal nodes represent sets of legal part sets Leaves are legal part sets

While solution not found Expand least node according to ordering relation Computer upper and lower bound Prune any dominated node

Page 23: Toward Learning Mixture-of-Parts Pictorial Structures

Alan Fern Oregon State University

Lower Bound Computations

Monotonicity: adding to a set of parts will never result in reduced cost Simply compute pictorial structure match of tree projected on parts in

search node Can improve on this by adding cost for “missing parts”

Page 24: Toward Learning Mixture-of-Parts Pictorial Structures

Alan Fern Oregon State University

Upper Bound Computations

Match entire MoPPS tree to image data Use as a heuristic for quickly finding legal completion of current part set Cost of completion is upper bound

Page 25: Toward Learning Mixture-of-Parts Pictorial Structures

Alan Fern Oregon State University

MoPPS Tree Parameters for Football

34 parts, 3200+ legal formations 16 basic player types plus subtypes

Connections modeled as Gaussian overideal location relative to “parent” player Parameters manually set using training images

Observation model uses two independent components : based on background model : based on color histogramming

Page 26: Toward Learning Mixture-of-Parts Pictorial Structures

Alan Fern Oregon State University

Background Model Register lots of video to field model

Learn kernel density estimate of color at each pixel

Page 27: Toward Learning Mixture-of-Parts Pictorial Structures

Alan Fern Oregon State University

Page 28: Toward Learning Mixture-of-Parts Pictorial Structures

Alan Fern Oregon State University

Page 29: Toward Learning Mixture-of-Parts Pictorial Structures

Alan Fern Oregon State University

Results

Page 30: Toward Learning Mixture-of-Parts Pictorial Structures

Alan Fern Oregon State University

Anytime Behavior: % Correct

• Exhaustive search requires close to an hour

• Greedy search is fast but achieves only 80% accuracy

• Mean-squared location error less than a yard

Page 31: Toward Learning Mixture-of-Parts Pictorial Structures

Alan Fern Oregon State University

Directions Learning MoPPS Models

Successfully hand-coded a MoPPS model Was quite time consuming to get parameters right Motivates supervised structure and parameter learning

MoPPS model takes average of 4 minutes per play Still too slow for weekly volume of game video Motivates speedup learning

MoPPS model will sometimes need to be relearned/adapted to different sets of video Want to reduce labelling effort Motivates active and transfer learning

Page 32: Toward Learning Mixture-of-Parts Pictorial Structures

Alan Fern Oregon State University

Structure and Parameter Learning

Goal: learn structure and parameters of MoPPS tree from labelled data Assume hard constraints on legal part sets provided

There are algorithms for learning the structure of pictorial structures Can easily modify to learn MoPPS tree Easy to combine with generative parameter learning

Page 33: Toward Learning Mixture-of-Parts Pictorial Structures

Alan Fern Oregon State University

Structure and Parameter Learning

Issue: pure generative parameter learning will not likely be sufficient Hand-coded model incorporate “reward terms” to make

up for deficiencies in generative observation model Suggests augmenting generative model with

discriminatively trained components

Issue: inference time of 4 minutes makes most generative training methods quite expensive Suggests using approaches that do not perform full joint

inference for each parameter update

Page 34: Toward Learning Mixture-of-Parts Pictorial Structures

Alan Fern Oregon State University

Speedup Learning

How can we speedup branch-and-bound search? There are a number of interesting settings

Setting 1: Given a MoPPS model & upper/lower bound functions Learn an effective search space operators

Setting 2: Given a MoPPS model & search space Learn more accurate upper/lower bound functions

Setting 3: Given a MoPPS model & search space & possibly bounds Learn an effective priority queue ranking function

Page 35: Toward Learning Mixture-of-Parts Pictorial Structures

Alan Fern Oregon State University

Active Model Calibration

Want to minimize labelling effort for new video set Active learning and/or semi-supervised

Want to leverage experience with previous videos Transfer learning

How can we combine these two paradigms for label efficient active model calibration? User interface is also critical

Very rough idea: Assume fixed model structure Learn prior on parameters from previous data sets Use prior for regularization and example selection

Page 36: Toward Learning Mixture-of-Parts Pictorial Structures

Alan Fern Oregon State University

Summary and Future Work

New structured output challenge problem We will provide labelled data set Can off-the-shelf structured learning approaches work

Suggests investigating lesser studied directions Speedup learning Active calibration

On the horizon Applying to defensive formations Full temporal play interpretation Mining strategic knowledge Strategic planning

Page 37: Toward Learning Mixture-of-Parts Pictorial Structures

Alan Fern Oregon State University

DigitalScout

Project

The

http://eecs.oregonstate.edu/football