visuelle perzeption für mensch- maschine schnittstellen...11.12.2009 people detection iii...
TRANSCRIPT
![Page 1: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/1.jpg)
Interactive Systems Laboratories, Universität Karlsr uhe (TH)Dr. Edgar Seemann, 02.11.09 1
Visuelle Perzeption für Mensch-Maschine Schnittstellen
Vorlesung, WS 2009
Prof. Dr. Rainer StiefelhagenDr. Edgar Seemann
Institut für AnthropomatikUniversität Karlsruhe (TH)
http://[email protected]
![Page 2: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/2.jpg)
Dr. Edgar Seemann, 02.11.09 2
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Computer Vision:
Tasks, Challenges,Performance measures
WS 2008/09
Dr. Edgar Seemann
![Page 3: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/3.jpg)
Interactive Systems Laboratories, Universität Karlsr uhe (TH)Dr. Edgar Seemann, 02.11.09 3
Termine (1)
3
TBA21.12.2009
Stereo and Optical Flow18.12.2009
Termine Thema19.10.2009 Introduction, Applications
23.10.2009 Basics: Cameras, Transformations, Color
26.10.2009 Basics: Image Processing
30.10.2009 Basics: Pattern recognition
02.11.2009 Computer Vision: Tasks, Challenges, Learning, Performance measures
06.11.2009 Face Detection I: Color, Edges (Birchfield)
09.11.2009 Project 1: Intro + Programming tips13.11.2009 Face Detection II: ANNs, SVM, Viola & Jones
16.11.2009 Project 1: Questions
20.11.2009 Face Recognition I: Traditional Approaches, Eigenfaces, Fisherfaces, EBGM
23.12.2009 Face Recognition II
27.11.2009 Head Pose Estimation: Model-based, NN, Texture Mapping, Focus of Attention
30.11.2009 People Detection I
03.12.2009 People Detection II
07.12.2009 Project 1: Student Presentations, Project 2: Intro11.12.2009 People Detection III (Part-Based Models)
14.12.2009 Scene Context and Geometry
![Page 4: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/4.jpg)
Dr. Edgar Seemann, 02.11.09 5
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
This Lecture
� Overview of computer vision� Recognition Problems/Challenges
� Learning Process
� Evaluation procedures
� Topics� Localization vs. Classification vs. Identification
� Challenges (Illumination, Intra-Class Variation etc.)
� Data Sets
� Performance Measures
� Non-Maximum Suppression
![Page 5: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/5.jpg)
Dr. Edgar Seemann, 02.11.09 6
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Recognition Problems
![Page 6: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/6.jpg)
Dr. Edgar Seemann, 02.11.09 7
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Recognition Problems
� Different Types of Recognition Problems:� Object Identification
� recognize your pencil, your dog, your car
� Object Classification� recognize any pencil, any dog, any car� also called: generic object recognition, object categorization, …
� Recognition and� Segmentation: separate pixels belonging to the foreground
(object) and the background� Localization: position of the object in the scene, pose
estimate (size/scale, orientation, 3D position)
� Terms are not used consistently in the literature
![Page 7: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/7.jpg)
Dr. Edgar Seemann, 02.11.09 8
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Classification
� Learn the properties of a whole object class
� Example:
� Challenge:� Generalize from training examples to unseen instances
� What is a class? -> Visual Categories
![Page 8: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/8.jpg)
Dr. Edgar Seemann, 02.11.09 9
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Identification
� Distinguish an object instance from other similar instances
� Example
� Challenge:� Find features which make the object unique (e.g. shape,
color)
� Generally requires a more detailed model than classification
Edgar
Rainer
![Page 9: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/9.jpg)
Dr. Edgar Seemann, 02.11.09 10
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Localization
� Find an object’s position and size within the image
� Challenge:� Multiple object instances� Huge search space for possible locations and sizes
![Page 10: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/10.jpg)
Dr. Edgar Seemann, 02.11.09 11
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Computer Vision today
� A vast amount of research interest in these areas� Universities, Industry
� Standard databases and challenges exist� Visual Object Challenge, Urban Grand Challenge
� A lot of progressin recent years
![Page 11: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/11.jpg)
Dr. Edgar Seemann, 02.11.09 12
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Challenges
![Page 12: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/12.jpg)
Dr. Edgar Seemann, 02.11.09 13
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Realistic Environments
� Computer vision systems for human computer interaction need to work in realistic environments
� Challenges we in realistic environments� Illumination
� Intra-Class Variation
� Perspective Transformations and Scale
� Articulations/Deformations
� View-Points
� Background clutter
� Occlusion
� Insufficient processing power
![Page 13: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/13.jpg)
Dr. Edgar Seemann, 02.11.09 14
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Challenges: Illumination
� Depending on the lighting conditions average pixel values can be rather bright or dark
� Overexposure or underexposure can hide certain details
� Effective local contrast normalization is essential for good performance (see e.g. [Dalal and Triggs, CVPR’05])
� For humans it still appears similar, but it’s quite different in terms of pixel values
![Page 14: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/14.jpg)
Dr. Edgar Seemann, 02.11.09 15
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Approaches: Illumination
� Histogram equalization� make sure color values
spread the full rangeof possible values
� Local normalization� Normalize the feature response in a
local area (e.g. gradient strength)
� Does not assume that all image areasare illuminated in the same manner
![Page 15: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/15.jpg)
Dr. Edgar Seemann, 02.11.09 16
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Background Clutter
� Background structures can distract detection� Background structures can have similar characteristics
as the object of interest
� Background structures may hide actual object boundaries
![Page 16: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/16.jpg)
Dr. Edgar Seemann, 02.11.09 18
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Challenges: Intra-Class Variation
� Objects often do not share a common color or texture� Difficult to derive recognition cues
� They can even be quite similar to other objects
![Page 17: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/17.jpg)
Dr. Edgar Seemann, 02.11.09 19
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Intra-Class Variation
� Model needs to be able to capture the variance
� But still retain information of how the object class is different from other classes
� Shape vs. Appearance� In some cases shape information is more promising to
characterize an object (e.g. cups, people)
� “internal regions are unreliable cues” [Dalal&Triggs]
� In some cases appearance information (i.e. color pixel values or texture) is more appropriate
![Page 18: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/18.jpg)
Dr. Edgar Seemann, 02.11.09 20
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Challenges: View-Points
� We distinguish� In-Plane rotation
� Out-Of-Plane rotation
� Objects may look completely different from various viewing angles
![Page 19: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/19.jpg)
Dr. Edgar Seemann, 02.11.09 21
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Approaches: View-Points
� In-Plane rotation:� Rotation-invariant object description (e.g. histograms)
� Computation of a dominant orientation as a reference (e.g. local descriptors, SIFT)
� Out-Of-Plane rotation:� Separate detector for different view-points
![Page 20: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/20.jpg)
Dr. Edgar Seemann, 02.11.09 22
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Perspective Transformations
� Slight changes in camera position can alter pixel values considerably
� Common approaches:� Just ignore it and hope that your model is flexible
enough to capture these differences
� Affine invariant local interest points-> compute homography from matching points
![Page 21: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/21.jpg)
Dr. Edgar Seemann, 02.11.09 23
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Scale
� Generally, we have no a-priori information on the pixel size of the object
� We also typically do not know� the real-world dimensions of the object
� the scene geometry
� Object details may be present/absent depending on the image resolution (if resolution is too small, objects are often hard to distinguish, even for humans)
![Page 22: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/22.jpg)
Dr. Edgar Seemann, 02.11.09 24
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Approaches: Scale
� Exhaustive search of all possible image locations and scales� Sliding window
� Scale estimation from local image regions� Scale-space theory
� Scale-invariant interest points
![Page 23: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/23.jpg)
Dr. Edgar Seemann, 02.11.09 25
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Sliding Window Technique
� We obtain for each position/scale a recognition score
� Parameters: scale range, scale steps, x/y-steps
� Positions with low scores can be discarded
Red: score>0.8
![Page 24: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/24.jpg)
Dr. Edgar Seemann, 02.11.09 26
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Sliding Window Technique
� Computationally expensive
� Often a very simple “pre-filter” is applied to already discard non-promising regions with minimal computation effort� E.g. discard homogeneous regions
� Examples:� Cascade of classifiers [Viola&Jones2000]
![Page 25: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/25.jpg)
Dr. Edgar Seemann, 02.11.09 27
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hciChallenges: Articulations / Deformations
� We distinguish:� Rigid objects (e.g. cars, cups)
� Non-Rigid objects (e.g. people, towels)
![Page 26: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/26.jpg)
Dr. Edgar Seemann, 02.11.09 28
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Articulations vs. Deformations
� Articulated objects� Models for different articulations
(e.g. body parts and configurations) � Ideally, an object model should only allow
configurations, which are common or evenpossible based
� Models can be � hand-crafted (often used for tracking)� automatic (more common in recognition)
� Deformable objects� Widely unsolved problem� Often the object’s texture is used as recognition cue
![Page 27: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/27.jpg)
Dr. Edgar Seemann, 02.11.09 29
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Challenges: Occlusion
� In realistic environments the object may not be always completely visible� Not only missing data, but also distracting data
� Need to find a way to ignore occluded portions of the object
![Page 28: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/28.jpg)
Dr. Edgar Seemann, 02.11.09 30
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Approaches: Occlusion
1. Part-based models� Model object parts instead of the whole object
� Detect the individual object parts
2. Iteratively find image pixels which comply with the object model� E.g. Robust PCA
� Problems:� Overlapping objects: which image part belongs to which
object hypothesis?
� Background: How to distinguish hypothesis from partly visible objects from background structures
![Page 29: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/29.jpg)
Dr. Edgar Seemann, 02.11.09 31
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Challenges: Processing Power
� Many computer vision algorithms require a huge amount of computation� Hard to do in real-time
� Computer Vision community started to use graphics hardware, multi-processors, computer clusters, FPGA
� This is one reason why computer vision has advanced considerably in recent years (and still will advance)� Many things have already been done
� But there’s still much room for improvement
� Some people claim, that enough memory (and fast access to it) is the key to recognition
![Page 30: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/30.jpg)
Dr. Edgar Seemann, 02.11.09 32
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Challenges: scale, efficiency
� About half of the cerebral cortex in primates is devoted to processing visual information [Felleman and van Essen 1991]
� 3,000-30,000 human recognizable object categories
� 30+ degrees of freedom in the pose of articulated objects (humans)
� Billions of images indexed by Google Image Search (not to mention video)
� 18 billion+ prints produced from digital camera images in 2004
� 295.5 million camera phones sold in 2005
K. Grauman, B. Leibe
![Page 31: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/31.jpg)
Dr. Edgar Seemann, 02.11.09 33
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Learning Process
![Page 32: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/32.jpg)
Dr. Edgar Seemann, 02.11.09 34
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Computer Vision Algorithms
� Computer vision algorithms are typically data-driven� i.e. example data is used instead of completely hand-
crafted models� Often machine learning techniques are used to process
data
� In this lecture we consider mainly ‘supervised’methods
� Typically, we distinguish 3 phases when we develop a computer vision algorithm� Training� Validation� Testing
![Page 33: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/33.jpg)
Dr. Edgar Seemann, 02.11.09 35
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Training Data
� Object recognition performance may strongly depend on the quality of the training data
� Generally, training data should:� Cover the full range of object variations� Contain realistic background structures� Be carefully pre-processed� The more, the better
![Page 34: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/34.jpg)
Dr. Edgar Seemann, 02.11.09 36
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Preprocessing
� Preprocessing reduces the degrees of freedom for a learning algorithm� We remove the variations, the algorithm may safely
ignore
� Alignment� Resize the training objects to a common size
� Align object centers and crop an image region around the object
� Mostly accomplished by manual labeling
� Normalization� E.g. Histogram equalization
![Page 35: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/35.jpg)
Dr. Edgar Seemann, 02.11.09 37
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hciIncreasing the number of training examples
� Methods to increase the number of training examples� Mirroring
� Perturbations in x-/y-positions
� You can never have enough training data!� Joint effort: LabelMe Database
![Page 36: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/36.jpg)
Dr. Edgar Seemann, 02.11.09 38
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
From Training Data to an Object Model
1. Feature Representation� E.g. Color-Histograms, Edge information
2. Typically learning approach� E.g. SVM, Neural Networks, Bayesian approaches
3. Possibly additional hand-crafted information� Domain knowledge (e.g. cars are generally weakly
textured in the interior)
� Object restrictions (e.g. a body model)
![Page 37: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/37.jpg)
Dr. Edgar Seemann, 02.11.09 39
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Object Models
� We distinguish� Generative Models� Discriminative Models
1. Generative Models� Learn a representative model
(i.e. learn how the object looks)� Generally, only needs positive training examples� Example: Principal Component Analysis
2. Discriminative Models� Learn what distinguishes the object from the background or
other classes (i.e. only learn the differences)� Needs positive and negative training examples� Example: Support Vector Machines
![Page 38: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/38.jpg)
Dr. Edgar Seemann, 02.11.09 40
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Model and Training
� Problems:� Generalization
� We are able to recognize data, which was not in the training set
� Typically better for generative approaches
� Discriminative power� We do not get too many false detections
� Typically better for discriminative approaches
![Page 39: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/39.jpg)
Dr. Edgar Seemann, 02.11.09 41
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Which model is best?
Training sample
Test sample
Simple model
Complex model
![Page 40: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/40.jpg)
Dr. Edgar Seemann, 02.11.09 42
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Occam’s razor
� „All other things being equal, the simplest solution is the best“
� It is often unknown, what an algorithm has learned exactly� i.e. make models as simple as possible, but as complex
as necessary
� The same applies to the training data (e.g. reduce unnecessary degrees of freedom, remove outliers)
![Page 41: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/41.jpg)
Dr. Edgar Seemann, 02.11.09 43
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Overfitting and Learning
� Overfitting� Occurs when the model has too many parameters and
may perfectly fit the training data
Training cycles
Err
or
Error on test data
Error on training data
Best generalization
![Page 42: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/42.jpg)
Dr. Edgar Seemann, 02.11.09 44
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Validation Data
� Overfitting can be avoided/detected by using a validation data set� If performance on validation set
drops, we stop training the model
� Problem:� Amount of available data is limited
� Solution:� Cross-Validation (e.g. leave-one-out)
![Page 43: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/43.jpg)
Dr. Edgar Seemann, 02.11.09 45
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Leave-One-Out
� Divide training data into k-partitions� Repeat for all possible partitions i
� Choose partition i as a validation set� Use the remaining k-1 partitions to train the model
� Combine the parameters of the resulting k models for the final model
� Positive:� Does not waste data
� Negative:� High-Computational effort
![Page 44: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/44.jpg)
Dr. Edgar Seemann, 02.11.09 46
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Testing
� Run algorithm
� Measure performance (this is not completely trivial)
![Page 45: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/45.jpg)
Dr. Edgar Seemann, 02.11.09 47
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Performance Measures
![Page 46: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/46.jpg)
Dr. Edgar Seemann, 02.11.09 48
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Performance Measures
� Measuring the performance of object recognition algorithms is not trivial
� There different measures depending on the application
1. For classification (i.e. yes/no decision, if object is present or not)� ROC (Receiver-Operating-Characteristic)
2. For localization (i.e. detecting the object’s position)� RPC (Recall-Precision-Curve)� DET (Detection Error Trade-Off)
![Page 47: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/47.jpg)
Dr. Edgar Seemann, 02.11.09 49
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Classifying a hypothesis
� When comparing recognition hypotheses with ground-truth annotations have to consider four cases:
� Example:
Prediction: Yes No Yes No
Case: TP FN FP TN
![Page 48: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/48.jpg)
Dr. Edgar Seemann, 02.11.09 50
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
ROC
� Used for the task of classification� Measures the trade-off between true positive rate
and false positive rate:
� Example:� Algorithm X detects 80% of all cups (true positive
rate), while making 25% error on images not containing cups
![Page 49: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/49.jpg)
Dr. Edgar Seemann, 02.11.09 51
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
ROC
� Each prediction hypothesis has generally an associated probability value or score
� The performance values can therefore plotted into a graph for each possible score as a threshold
1
1
15% TPR, 1% FPR
35% TPR, 3% FPR
60% TPR, 11% FPR
72% TPR, 30% FPR
85% TPR, 68% FPR
100% TPR, 100% FPR
True positive rate
false positive rate
ROC
heaven
ROC
hell
![Page 50: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/50.jpg)
Dr. Edgar Seemann, 02.11.09 52
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Recall-Precision
� For localization ROC is not appropriate, since the number of hypothesis varies
� We therefore define:� Recall: percentage of objects found
� Precision: percentage of correct hypotheses
classification localization
![Page 51: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/51.jpg)
Dr. Edgar Seemann, 02.11.09 53
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Plotting Recall-Precision
� RPC are typically plotted on a recall vs. 1-precision scale:
1
1
15% recall, 99% precision
35% TPR, 97% precision
60% recall, 89% precision
70% recall, 68% precision
72% TPR, 70% precision
1-precision
recall
![Page 52: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/52.jpg)
Dr. Edgar Seemann, 02.11.09 54
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Real-World Example
![Page 53: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/53.jpg)
Dr. Edgar Seemann, 02.11.09 55
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Detection Error Trade-Off
� DET measures the number of false detections per tested image window with respect to the miss-rate (1 – recall)
� Used for a sliding window based detector
� Disadvantages:� Chart more difficult
to read (e.g. log-scale)� Depends on the number
of windows tested(i.e. image size, slidingwindow parameters)
� Does not measure the performance of the complete detection system including non-maximum suppression
![Page 54: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/54.jpg)
Dr. Edgar Seemann, 02.11.09 56
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Further measures
� In order to express the performance in a single figure, the following measures are common:� Equal Error Rate (EER)
� The point where the errors for true positives and false positives are equal (i.e. points on the diagonal from (0,1) to (1,0))
� Not well-defined for RPC
� Area Under Curve (for ROC)� The area under the curve ;-)
� Mean Average Precision (for RPC)� Average precision values� Sometimes measured only at pre-defined recall values
� False Positives per Image (FPPI)
![Page 55: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/55.jpg)
Dr. Edgar Seemann, 02.11.09 57
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Localization and Ground-Truth
� For localization, the test data is mostly annotated with ground-truth bounding boxes
� It is often not obvious when to count a hypotheses as true or false detection� Misaligned hypotheses
� Double detections
ground-truth
detection
![Page 56: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/56.jpg)
Dr. Edgar Seemann, 02.11.09 58
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Comparing hypotheses to Ground-Truth
� Comparison measures1. Relative Distance
2. Cover and Overlap
1. Relative distance
![Page 57: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/57.jpg)
Dr. Edgar Seemann, 02.11.09 59
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Comparing hypotheses to Ground-Truth
2. Cover and Overlap
� There is no standard for which values to choose
� Sometimes used as threshold:� Cover> 50%, Overlap>50%, Relative Distance <0.5
� Double detections are counted as false positive
![Page 58: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/58.jpg)
Dr. Edgar Seemann, 02.11.09 60
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Example: Localization
� Algorithm output:� Image 1: (13, 34, 200, 200), Prob: 0.8
� Image 1: (341, 211, 174, 174), Prob: 0.4
� …
� Image n: (78, 12, 55, 55), Prob: 0.7
� Sort hypotheses according to the score/probability
� Classify hypotheses into true and false positives
� Increase threshold gradually
� Compute for each new hypothesis the recall and precision values
![Page 59: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/59.jpg)
Dr. Edgar Seemann, 02.11.09 61
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Non-Maximum Suppression
� A good detector will generally not only fire on the exact position
� Need to reduce the number of detections, since every additional detection (even on the object) will count as false positive detection
![Page 60: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/60.jpg)
Dr. Edgar Seemann, 02.11.09 62
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Naïve Approaches
� Based on cover/overlap� Accept the strongest hypothesis
� Remove all other hypotheses, which strongly overlap
� Work reasonably well in practice
� Cannot deal well with partially overlapping objects
![Page 61: Visuelle Perzeption für Mensch- Maschine Schnittstellen...11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Dr. Edgar Seemann, 02.11.09 5 Computer](https://reader033.vdocuments.mx/reader033/viewer/2022050207/5f5a42db059a86052452bd28/html5/thumbnails/61.jpg)
Dr. Edgar Seemann, 02.11.09 63
Com
pute
r V
isio
n fo
r H
uman
-Com
pute
r In
tera
ctio
n
Res
earc
h G
roup
, Uni
vers
itätK
arls
ruhe
(TH
)cv
:hci
Advanced approaches
� Finding modes in a non-parametric distribution� Kernel Density Estimation
� Mean-Shift Mode Estimation (MSME) (e.g. [Dalal’05])
� Pixel-based reasoning (e.g. [Leibe et al. 2004])� Infer an object segmentation
� Use segmentation to determine, which pixels belong to which object
� Clustering techniques