detecting actions, poses, and objects with relational

Detecting Actions, Poses, and Objects with Relational Phraselets

by Chaitanya Desai and Deva Ramanan

Presented by: Antonia CreswellDetecting Actions, Poses, and Objects with Relational Phraselets Chaitanya Desai and Deva Ramanan

Wednesday, 5 November 14

Problem

• Humans interact with objects in a variety of ways

• Interaction with objects leads to occlusions

• May be many people in one image


Interact in different ways:

Deva Ramanan, University of California, Irvine


http://videolectures.net/deva_ramanan/


Interaction lead to occlusions





Many people in one image





Motivation

• Articulated Skeletons

• Visual Phrases

• Poselets

http://www.urbiforge.org/index.php/Modules/UKinect2

Poselets and Their Applications in High-Level Computer Vision

Recognition using Visual Phrases Ali Farhadi, Mohammad Amin Sadeghi




http://www.cs.illinois.edu/homes/afarhad2/


http://www.cs.illinois.edu/homes/msadegh2/


Key Contributions/ Technical Ideas

• Identify phraselets

• Create a model as a composite of phraselets

• Apply relational constraints between phraselets

Detecting Actions, Poses, and Objects with Relational Phraselets Chaitanya Desai and Deva RamananWednesday, 5 November 14

Identify PhraseletsPosition of part

Occluded or not?

Phraselet Label

Feature for part i in image n:

Cluster these to get the phraselets labelsKey Point: Occluded and non-Occluded parts are clustered separately: They have their own

set of labels!Detecting Actions, Poses, and Objects with Relational Phraselets Chaitanya Desai and Deva Ramanan Deva Ramanan, University of California, Irvine




Relational Model

- E is the edge (or relation) between two parts - S is the score

encodes a prior acting as a compatibility measure

template tuned for mixture t(i)

HOG feature vector

springs that spatially constrain the parts i and j

deformation vector computed from the offset of pi&pj

Detecting Actions, Poses, and Objects with Relational Phraselets Chaitanya Desai and Deva Ramanan Deva Ramanan, University of California, Irvine




Learning this modelpart i

from class: t(2)

part jfrom class: t(1)

Edge label: I(z(i)| z(j))- Maximise Score S

- Find Max weight spanning tree

Location and types for all parts in n

Linear model

Learn Thetas to minimise:

Detecting Actions, Poses, and Objects with Relational Phraselets Chaitanya Desai and Deva Ramanan


Models learned with the tree structure:


Experimental Setup & Results

• Action Detection

• Action Classification

• Pose Evaluation considering occlusion


Action Detection



False False Positives

Top False PositivesFalse False Positives due to bounding box errors


Action Detection : Precision - Recall

Compares to visual phrase as a base line


Recognition using Visual Phrases Ali Farhadi, Mohammad Amin Sadeghi






Action Classification


Compare to DPM/VP, FMP, FMP + occ


Pose Estimation

• Should report location of all parts and any that have been occluded

• Novel scheme for evaluating models


Pose Scores:


F1 scores:Penalise for labelling occluded points as visible

Combines pose estimation with aspect estimation


Percentage of correct parts

• Reports on location of all parts including occlusions

• Suggests that this model predicts location of occluded parts well


Strengths & Weaknesses

• Relation between parts

• Ability to predict the location of occluded parts

• Separating clusters for occluded and non-occluded parts


Questions


detecting actions, poses, and objects with relational

Documents