computer vision for computer graphics part 2 - faculty of …sspi3/computervision-pt2.pdf ·...
TRANSCRIPT
14/04/2010
1
Computer Vision
for Computer Graphics – part 2
Mark Borg
Outline
Mark Borg 28 March 2010 CSA2207
We will be looking at the following CV areas:
Stereovision
Recovering depth information
Stereo correspondence problem
Multi-view imaging and the Plenoptic function
Applications to CG:
3D Model Acquisition
View Morphing, “bullet time” effect
Automated Visual Surveillance
Motion Detection
Background Subtraction techniques
Object Tracking
Applications to CG:
Motion Capture
Basis for Behaviour Recognition in HCI interfaces, Project Natal
14/04/2010
2
Automated Visual (Video) Surveillance (AVS)
Part of Computer Vision
Too many cameras, too few operators
Applications: Security
E.g. Unauthorised access, perimeter surveillance.
Safety E.g. Tunnel fire & accident detection
Improving efficiency E.g. Traffic congestion & flow control
Behaviour analysis/recognition E.g. Detecting loitering, path analysis
Situational awareness E.g. Wide area sensor network
Video mining E.g. Querying CCTV video, crime scene
analysis
Mark Borg 28 March 2010 CSA2207
Automated Visual (Video) Surveillance (AVS)
Part of Computer Vision
Too many cameras, too few operators
Applications: Security
E.g. Unauthorised access, perimeter surveillance.
Safety E.g. Tunnel fire & accident detection
Improving efficiency E.g. Traffic congestion & flow control
Behaviour analysis/recognition E.g. Detecting loitering, path analysis
Situational awareness E.g. Wide area sensor network
Video mining E.g. Querying CCTV video, crime scene
analysis
Mark Borg 28 March 2010 CSA2207
14/04/2010
3
Automated Visual (Video) Surveillance (AVS)
Part of Computer Vision
Too many cameras, too few operators
Applications: Security
E.g. Unauthorised access, perimeter surveillance.
Safety E.g. Tunnel fire & accident detection
Improving efficiency E.g. Traffic congestion & flow control
Behaviour analysis/recognition E.g. Detecting loitering, path analysis
Situational awareness E.g. Wide area sensor network
Video mining E.g. Querying CCTV video, crime scene
analysis
Mark Borg 28 March 2010 CSA2207
Automated Visual (Video) Surveillance (AVS)
Part of Computer Vision
Too many cameras, too few operators
Applications: Security
E.g. Unauthorised access, perimeter surveillance.
Safety E.g. Tunnel fire & accident detection
Improving efficiency E.g. Traffic congestion & flow control
Behaviour analysis/recognition E.g. Detecting loitering, path analysis
Situational awareness E.g. Wide area sensor network
Video mining E.g. Querying CCTV video, crime scene
analysis
Mark Borg 28 March 2010 CSA2207
14/04/2010
4
Automated Visual (Video) Surveillance (AVS)
Part of Computer Vision
Too many cameras, too few operators
Applications: Security
E.g. Unauthorised access, perimeter surveillance.
Safety E.g. Tunnel fire & accident detection
Improving efficiency E.g. Traffic congestion & flow control
Behaviour analysis/recognition E.g. Detecting loitering, path analysis
Situational awareness E.g. Wide area sensor network
Video mining E.g. Querying CCTV video, crime scene
analysis
Mark Borg 28 March 2010 CSA2207
Automated Visual (Video) Surveillance (AVS)
Part of Computer Vision
Too many cameras, too few operators
Applications: Security
E.g. Unauthorised access, perimeter surveillance.
Safety E.g. Tunnel fire & accident detection
Improving efficiency E.g. Traffic congestion & flow control
Behaviour analysis/recognition E.g. Detecting loitering, path analysis
Situational awareness E.g. Wide area sensor network
Video mining E.g. Querying CCTV video, crime scene
analysis
Mark Borg 28 March 2010 CSA2207
14/04/2010
5
Automated Visual (Video) Surveillance (AVS)
Part of Computer Vision
Too many cameras, too few operators
Applications: Security
E.g. Unauthorised access, perimeter surveillance.
Safety E.g. Tunnel fire & accident detection
Improving efficiency E.g. Traffic congestion & flow control
Behaviour analysis/recognition E.g. Detecting loitering, path analysis
Situational awareness E.g. Wide area sensor network
Video mining E.g. Querying CCTV video, crime scene
analysis
Mark Borg 28 March 2010 CSA2207
AVS Characteristics I
Can be Single or Multi-Camera systems
Multi-Camera systems:
Overlapping FOVs (Fields-of-view) vs. Disjoint FOVs
Camera Hand-Over problem
Mark Borg 28 March 2010 CSA2207
14/04/2010
6
AVS Characteristics II
Mark Borg 28 March 2010 CSA2207
Top-Down vs. Bottom-Up strategies
Top-Down approach
Hypothesis driven, Model-based
Using prior information about the scene, objects and their dynamics, to generate and verify hypotheses against the video data
Computationally very complex
Hard to achieve real-time speeds
Tend to be highly domain specific
Source: AVITRACK
project
AVS Characteristics III
Mark Borg 28 March 2010 CSA2207
Bottom-Up approach
Data driven
Analysing the video data to extract salient information
Less domain specific
Computational complexity lower than for TD approaches
Hybrid systems
14/04/2010
7
Real-time Requirements
Mark Borg 28 March 2010 CSA2207
Must run in real-time
Generally defined to be at 25 FPS (frames per second)
Computationally intensive
Massive amounts of video data are processed
Typical video input rate for 1 camera: CCTV 4CIF PAL frame size,
colour,
at 25fps
Great potential for Parallelisation
704 x 576 x 3 x 25 = 29.1 MB/s
Typical modules of an AVS system
Mark Borg 28 March 2010 CSA2207
Offline
Main
modules
KEY
Video data
Semantic
information
Object
Tracking
Event Detection/
Alarm Generation
Behaviour
Analysis /
Recognition
Human
Computer Interface
Motion
Detection
Camera
Calibration
Scene
Modelling Video Capture
& Encoding
Object
Categorisation
/ Recognition
3D Localisation
Multi-camera
Data Fusion
...
14/04/2010
8
Example: The AVITRACK project
Mark Borg 28 March 2010 CSA2207
Example: The AVITRACK project
Mark Borg 28 March 2010 CSA2207
Airport apron surveillance
Monitoring activity around
aeroplane at gate during
servicing
8-camera system, overlapping FOVs, fibre-optic network for video transmission
Project Aim: Improved security and more efficient turn-arounds
14/04/2010
9
Example: The AVITRACK project
Mark Borg 28 March 2010 CSA2207
4 dual-core server blades (1 CPU per camera)
Motion detection runs independently for each camera
Example: The AVITRACK project
Mark Borg 28 March 2010 CSA2207
Object tracking runs independently for each camera
Tracking uses colour, motion information and object features
14/04/2010
10
Example: The AVITRACK project
Mark Borg 28 March 2010 CSA2207
Cameras are calibrated
3D position of tracked objects is estimated
from image position and camera geometry
Example: The AVITRACK project
Mark Borg 28 March 2010 CSA2207
Object recognition using top-down model fitting
3D model fitting is not real-time, thus it runs on a separate thread.
Communication between the tracker and 3D model fitting is via a queue
(requests expiring if not serviced in a certain time window).
14/04/2010
11
Example: The AVITRACK project
Mark Borg 28 March 2010 CSA2207
Observations from all 8 cameras combined together
Improves tracking results and better 3D positions
A server dedicated to the data fusion process
Example: The AVITRACK project
Mark Borg 28 March 2010 CSA2207
Scene Understanding module performing behaviour recognition using
temporal and spatial logic
A dedicated server for behaviour recognition and GUI
14/04/2010
12
Example: The AVITRACK project
Mark Borg 28 March 2010 CSA2207
Scene Understanding module performing behaviour recognition using
temporal and spatial logic
A dedicated server for behaviour recognition and GUI
Motion Detection
Mark Borg 28 March 2010 CSA2207
Frame Difference Also called Frame-to-Frame Difference
Background Subtraction
Not as simple as FD
Need background
14/04/2010
13
Digital Image
Mark Borg 20 Nov 2009 CSA2207
Frame Difference
Mark Borg 28 March 2010 CSA2207
Working with greyscale images:
Given 2 frames ft, ft+1:
Given some threshold K:
BGRV 3
1
),(),(),( 1 yxfyxfABSyxd ttt
otherwise
Kyxdifyxm
t
t0
),(1),(
14/04/2010
14
Frame Difference II
Mark Borg 28 March 2010 CSA2207
Background Subtraction
Mark Borg 28 March 2010 CSA2207
14/04/2010
15
Background Model
Mark Borg 28 March 2010 CSA2207
Single video frame of empty scene?
Image data subject to noise
Background Model
Mark Borg 28 March 2010 CSA2207
Single video frame of empty scene?
Image data subject to noise
Analysing pixel value variability over
several video frames... Pixel (100,100) over 200 background frames...
14/04/2010
16
Background Model
Mark Borg 28 March 2010 CSA2207
Single video frame of empty scene?
Image data subject to noise
Analysing pixel value variability over
several video frames... Pixel (100,100) over 200 background frames...
How can we model this?
Background Model
Mark Borg 28 March 2010 CSA2207
Single video frame of empty scene?
Image data subject to noise
Analysing pixel value variability over
several video frames... Pixel (100,100) over 200 background frames...
How can we model this? As a Gaussian (Normal)
Distribution
14/04/2010
17
Gaussian (Normal) Distribution
Mark Borg 28 March 2010 CSA2207
The Gaussian is described by 2 parameters:
Mean
Standard deviation
2
2
2
2
1)(
x
exf
Gaussian (Normal) Distribution II
Mark Borg 28 March 2010 CSA2207
Up to 1 standard deviation from the mean accounts for 68.27% of all values
Up to 2 standard deviations from the mean accounts for 95.45%
Up to 3 standard deviations from the mean accounts for 99.73%
Hence, the 3-sigma rule:
Nearly all values lie within 3 standard deviations of the mean.
3
14/04/2010
18
Background Model II
Mark Borg 28 March 2010 CSA2207
Modelling the background pixel with a Gaussian:
If the value of this pixel in any video frame does not obey
the 3-sigma rule: , i.e., is outside the range:
then we can consider it as not explained by our
background model, and hence probably due to motion.
0573.1,6.106
3
7.109,4.1030573.136.106
Background Model Learning
Mark Borg 28 March 2010 CSA2207
Normally an offline process
In the test video sequence supplied, there are 200 video
frames (8 seconds) available for background learning
Gaussian parameters calculated as follows:
where N is number of background frames
14/04/2010
19
Background Model Learning
Mark Borg 28 March 2010 CSA2207
Normally an offline process
In the test video sequence supplied, there are 200 video
frames (8 seconds) available for background learning
Gaussian parameters calculated as follows:
where N is number of background frames
While iterating through the
background frames, need to keep only
two values:
• sum of pixel values, and
• sum of pixel values squared
While iterating through the
background frames, need to keep only
two values:
• sum of pixel values, and
• sum of pixel values squared
Background Subtraction II
Mark Borg 28 March 2010 CSA2207
Given a frame at time t and background model :
Background subtraction is highly parallelisable
,tf
14/04/2010
20
Motion Detection results
Mark Borg 28 March 2010 CSA2207
Background subtraction running on the supplied test video:
Motion Detection results II
Mark Borg 28 March 2010 CSA2207
Some observations:
Issue of shadow
Noise in motion result (false positives)
These are mostly „isolated‟ pixels, i.e., not supported by their neighbours!
14/04/2010
21
Eliminating noise in motion bitmap
Mark Borg 28 March 2010 CSA2207
Motion pixels not supported by their neighbours are
most probably false
For each foreground (motion) pixel:
Count the number of neighbouring pixels that are also foreground
If count < K, then set the pixel to background (i.e. not a motion pixel)
Note: K is typically set to 3
Motion Detection results II
Mark Borg 28 March 2010 CSA2207
After eliminating isolated foreground pixels:
Result has improved, though there are still small groups of false positives...
...these will be eliminated later when we do blob size filtering.
14/04/2010
22
Background Model Update
Mark Borg 28 March 2010 CSA2207
The background model can get out of date!
Variation in illumination levels
Sudden changes (e.g., clouds, specular reflections)
Gradual changes (e.g., light levels during the day)
Objects can become stationary (e.g., a car is parked)
Background model needs updating
Running average
Typically alpha set to something like 0.05
From foreground pixels to blobs
Mark Borg 28 March 2010 CSA2207
The motion detection process treats each pixel
independently of the others
We must now group the foreground pixels together
to form „blobs‟
Hopefully a blob will correspond to an object
in the real world
Not always a 1-1 correspondence
Object fragmentation E.g., caused by misdetection
Object merging
E.g., caused by occlusion
The algorithm used here for grouping foreground
pixels is called Connected Component Labelling
14/04/2010
23
Connected Component Labelling
Mark Borg 28 March 2010 CSA2207
Algorithm
Input: motion bitmap
Output: a set of labelled connected components (blobs)
A 2-pass algorithm, i.e., the image data is traversed twice
Using 8-connectivity in this case
Connected Component Labelling II
Mark Borg 28 March 2010 CSA2207
Algorithm On 1st pass:
1. Iterate through each pixel:
2. If the pixel is foreground:
1. Get the foreground neighbours of the pixel
2. If none found, uniquely label the current pixel and
continue
3. Otherwise, find the neighbour with the smallest
label and assign it to the current pixel
4. If more than one label, store the equivalence
between neighbouring labels
On 2nd pass:
1. Iterate through each pixel:
2. If the pixel is foreground:
1. Re-label the pixel with the lowest equivalent label
14/04/2010
24
Connected Component Labelling II
Mark Borg 28 March 2010 CSA2207
Algorithm On 1st pass:
1. Iterate through each pixel:
2. If the pixel is foreground:
1. Get the foreground neighbours of the pixel
2. If none found, uniquely label the current pixel and
continue
3. Otherwise, find the neighbour with the smallest
label and assign it to the current pixel
4. If more than one label, store the equivalence
between neighbouring labels
On 2nd pass:
1. Iterate through each pixel:
2. If the pixel is foreground:
1. Re-label the pixel with the lowest equivalent label
Connected Component Labelling II
Mark Borg 28 March 2010 CSA2207
Algorithm On 1st pass:
1. Iterate through each pixel:
2. If the pixel is foreground:
1. Get the foreground neighbours of the pixel
2. If none found, uniquely label the current pixel and
continue
3. Otherwise, find the neighbour with the smallest
label and assign it to the current pixel
4. If more than one label, store the equivalence
between neighbouring labels
On 2nd pass:
1. Iterate through each pixel:
2. If the pixel is foreground:
1. Re-label the pixel with the lowest equivalent label
14/04/2010
25
Connected Component Labelling II
Mark Borg 28 March 2010 CSA2207
Algorithm On 1st pass:
1. Iterate through each pixel:
2. If the pixel is foreground:
1. Get the foreground neighbours of the pixel
2. If none found, uniquely label the current pixel and
continue
3. Otherwise, find the neighbour with the smallest
label and assign it to the current pixel
4. If more than one label, store the equivalence
between neighbouring labels
On 2nd pass:
1. Iterate through each pixel:
2. If the pixel is foreground:
1. Re-label the pixel with the lowest equivalent label
Connected Component Labelling II
Mark Borg 28 March 2010 CSA2207
Algorithm On 1st pass:
1. Iterate through each pixel:
2. If the pixel is foreground:
1. Get the foreground neighbours of the pixel
2. If none found, uniquely label the current pixel and
continue
3. Otherwise, find the neighbour with the smallest
label and assign it to the current pixel
4. If more than one label, store the equivalence
between neighbouring labels
On 2nd pass:
1. Iterate through each pixel:
2. If the pixel is foreground:
1. Re-label the pixel with the lowest equivalent label
14/04/2010
26
Connected Component Labelling II
Mark Borg 28 March 2010 CSA2207
Algorithm On 1st pass:
1. Iterate through each pixel:
2. If the pixel is foreground:
1. Get the foreground neighbours of the pixel
2. If none found, uniquely label the current pixel and
continue
3. Otherwise, find the neighbour with the smallest
label and assign it to the current pixel
4. If more than one label, store the equivalence
between neighbouring labels
On 2nd pass:
1. Iterate through each pixel:
2. If the pixel is foreground:
1. Re-label the pixel with the lowest equivalent label
Connected Component Labelling II
Mark Borg 28 March 2010 CSA2207
Algorithm On 1st pass:
1. Iterate through each pixel:
2. If the pixel is foreground:
1. Get the foreground neighbours of the pixel
2. If none found, uniquely label the current pixel and
continue
3. Otherwise, find the neighbour with the smallest
label and assign it to the current pixel
4. If more than one label, store the equivalence
between neighbouring labels
On 2nd pass:
1. Iterate through each pixel:
2. If the pixel is foreground:
1. Re-label the pixel with the lowest equivalent label
14/04/2010
27
Connected Component Labelling II
Mark Borg 28 March 2010 CSA2207
Algorithm On 1st pass:
1. Iterate through each pixel:
2. If the pixel is foreground:
1. Get the foreground neighbours of the pixel
2. If none found, uniquely label the current pixel and
continue
3. Otherwise, find the neighbour with the smallest
label and assign it to the current pixel
4. If more than one label, store the equivalence
between neighbouring labels
On 2nd pass:
1. Iterate through each pixel:
2. If the pixel is foreground:
1. Re-label the pixel with the lowest equivalent label
Connected Component Labelling III
Mark Borg 28 March 2010 CSA2207
Algorithm On 1st pass:
1. Iterate through each pixel:
2. If the pixel is foreground:
1. Get the foreground neighbours of the pixel
2. If none found, uniquely label the current pixel and
continue
3. Otherwise, find the neighbour with the smallest
label and assign it to the current pixel
4. If more than one label, store the equivalence
between neighbouring labels
On 2nd pass:
1. Iterate through each pixel:
2. If the pixel is foreground:
1. Re-label the pixel with the lowest equivalent label
14/04/2010
28
Connected Component Labelling III
Mark Borg 28 March 2010 CSA2207
Algorithm On 1st pass:
1. Iterate through each pixel:
2. If the pixel is foreground:
1. Get the foreground neighbours of the pixel
2. If none found, uniquely label the current pixel and
continue
3. Otherwise, find the neighbour with the smallest
label and assign it to the current pixel
4. If more than one label, store the equivalence
between neighbouring labels
On 2nd pass:
1. Iterate through each pixel:
2. If the pixel is foreground:
1. Re-label the pixel with the lowest equivalent label
Connected Component Labelling III
Mark Borg 28 March 2010 CSA2207
Algorithm On 1st pass:
1. Iterate through each pixel:
2. If the pixel is foreground:
1. Get the foreground neighbours of the pixel
2. If none found, uniquely label the current pixel and
continue
3. Otherwise, find the neighbour with the smallest
label and assign it to the current pixel
4. If more than one label, store the equivalence
between neighbouring labels
On 2nd pass:
1. Iterate through each pixel:
2. If the pixel is foreground:
1. Re-label the pixel with the lowest equivalent label
14/04/2010
29
Connected Component Labelling III
Mark Borg 28 March 2010 CSA2207
Algorithm On 1st pass:
1. Iterate through each pixel:
2. If the pixel is foreground:
1. Get the foreground neighbours of the pixel
2. If none found, uniquely label the current pixel and
continue
3. Otherwise, find the neighbour with the smallest
label and assign it to the current pixel
4. If more than one label, store the equivalence
between neighbouring labels
On 2nd pass:
1. Iterate through each pixel:
2. If the pixel is foreground:
1. Re-label the pixel with the lowest equivalent label
Connected Component Labelling III
Mark Borg 28 March 2010 CSA2207
Algorithm On 1st pass:
1. Iterate through each pixel:
2. If the pixel is foreground:
1. Get the foreground neighbours of the pixel
2. If none found, uniquely label the current pixel and
continue
3. Otherwise, find the neighbour with the smallest
label and assign it to the current pixel
4. If more than one label, store the equivalence
between neighbouring labels
On 2nd pass:
1. Iterate through each pixel:
2. If the pixel is foreground:
1. Re-label the pixel with the lowest equivalent label
14/04/2010
30
Connected Component Labelling III
Mark Borg 28 March 2010 CSA2207
Algorithm On 1st pass:
1. Iterate through each pixel:
2. If the pixel is foreground:
1. Get the foreground neighbours of the pixel
2. If none found, uniquely label the current pixel and
continue
3. Otherwise, find the neighbour with the smallest
label and assign it to the current pixel
4. If more than one label, store the equivalence
between neighbouring labels
On 2nd pass:
1. Iterate through each pixel:
2. If the pixel is foreground:
1. Re-label the pixel with the lowest equivalent label
Connected Component Labelling III
Mark Borg 28 March 2010 CSA2207
Algorithm On 1st pass:
1. Iterate through each pixel:
2. If the pixel is foreground:
1. Get the foreground neighbours of the pixel
2. If none found, uniquely label the current pixel and
continue
3. Otherwise, find the neighbour with the smallest
label and assign it to the current pixel
4. If more than one label, store the equivalence
between neighbouring labels
On 2nd pass:
1. Iterate through each pixel:
2. If the pixel is foreground:
1. Re-label the pixel with the lowest equivalent label
14/04/2010
31
Connected Component Labelling III
Mark Borg 28 March 2010 CSA2207
Algorithm On 1st pass:
1. Iterate through each pixel:
2. If the pixel is foreground:
1. Get the foreground neighbours of the pixel
2. If none found, uniquely label the current pixel and
continue
3. Otherwise, find the neighbour with the smallest
label and assign it to the current pixel
4. If more than one label, store the equivalence
between neighbouring labels
On 2nd pass:
1. Iterate through each pixel:
2. If the pixel is foreground:
1. Re-label the pixel with the lowest equivalent label
Connected Component Labelling III
Mark Borg 28 March 2010 CSA2207
Algorithm On 1st pass:
1. Iterate through each pixel:
2. If the pixel is foreground:
1. Get the foreground neighbours of the pixel
2. If none found, uniquely label the current pixel and
continue
3. Otherwise, find the neighbour with the smallest
label and assign it to the current pixel
4. If more than one label, store the equivalence
between neighbouring labels
On 2nd pass:
1. Iterate through each pixel:
2. If the pixel is foreground:
1. Re-label the pixel with the lowest equivalent label
14/04/2010
32
Connected Component Labelling III
Mark Borg 28 March 2010 CSA2207
Algorithm On 1st pass:
1. Iterate through each pixel:
2. If the pixel is foreground:
1. Get the foreground neighbours of the pixel
2. If none found, uniquely label the current pixel and
continue
3. Otherwise, find the neighbour with the smallest
label and assign it to the current pixel
4. If more than one label, store the equivalence
between neighbouring labels
On 2nd pass:
1. Iterate through each pixel:
2. If the pixel is foreground:
1. Re-label the pixel with the lowest equivalent label
Blob Descriptors
Mark Borg 28 March 2010 CSA2207
The following is the blob
information that needs to be
extracted:
Blob area/size, i.e., the number of pixels
Blob bounding box
This information can be collected during
Connected Component Labelling to avoid
performing multiple passes over the image
14/04/2010
33
Blob extraction results
Mark Borg 28 March 2010 CSA2207
After performing connected component labelling:
Note the small spurious blobs caused by noise in the motion detector
Will be solved through blob size filtering
Blob Size Filtering
Mark Borg 28 March 2010 CSA2207
Small blobs caused by noise in the motion detector
process
Solution:
Filter blobs by their area
For each blob B:
If their area < minSize:
Delete blob B
Note: using minSize = 50 is a good threshold for the video sequence used
in this assignment
14/04/2010
34
Blob Size Filtering results
Mark Borg 28 March 2010 CSA2207
After performing blob size filtering:
Number of initial blobs found: 92 Number of blobs after merging: 7
Final result
Mark Borg 28 March 2010 CSA2207
These 3 marbles appear merged into 1
blob because of partial occlusion and due
to the merging of shadows.
Our motion detector has no shadow
suppression component (It is beyond the
scope of this assignment).
14/04/2010
35
Short note on Shadow Suppression
Mark Borg 28 March 2010 CSA2207
Shadow detection and suppression Based on the observation that where a shadow falls, the background gets darker,
but its colour remains unchanged
Must use a colour space that separates colour into luminance and chrominance components RGB can‟t be used
HSV (Hue, Saturation, Value) Converting from RGB to HSV is expensive
YCbCr (Luminance, Blue-difference, Red-difference)
Luminance channel: Y, Chrominance channels: Cb, Cr
Short note on Shadow Suppression II
Mark Borg 28 March 2010 CSA2207
Using method by Horprasert et al. “Robust Background Subtraction and Shadow Detection”, ACCV, 2000
Motion Detection performed on channels Y, Cb, Cr, i.e., 3
Gaussians per pixel
Classify image pixels as:
• Background if Y, Cb, Cr are all within the range μ±3σ
• Shaded Background if Y < μ-3σ, and Cb, Cr within the range μ±3σ
• Highlighted Background if Y > μ+3σ, and Cb, Cr within the range μ±3σ
• Foreground in all other cases where Cb and/or Cr are not
within the range μ±3σ
14/04/2010
36
Short note on Shadow Suppression III
Mark Borg 28 March 2010 CSA2207
Shadow
Highlight
Foreground
Background
w/o shadow removal with shadow removal
Object Tracking
Mark Borg 28 March 2010 CSA2207
So far each video frame has been processed independently
Result at the end of each video frame, is a list of blobs
We need to keep track of the detected blobs across video
frames (across time)
Maintaining the identity of objects throughout the video sequence
time t time t+n
14/04/2010
37
Object Tracking
Mark Borg 28 March 2010 CSA2207
So far each video frame has been processed independently
Result at the end of each video frame, is a list of blobs
We need to keep track of the detected blobs across video
frames (across time)
Maintaining the identity of objects throughout the video sequence
time t time t+n
Object Tracking II
Mark Borg 28 March 2010 CSA2207
Object Tracking can be considered as a correspondence problem
Matching objects tracked in previous video frames (f0 .. ft-1) to the blobs of video frame ft.
In tracking, the blobs of frame ft are sometimes referred to as “observations”
Not always a 1-1 correspondence Objects may become occluded or partially occluded
E.g. a person walking behind a car
Objects may appear to fragment E.g. due to misdetection of part of an object
(camouflage)
Objects may appear to merge together
E.g. caused by shadow
Objects may physically combine into one
E.g. a person enters a slowly-moving car
An object may physically split into multiple ones E.g. a person leaves a bag behind
14/04/2010
38
Object Tracking II
Mark Borg 28 March 2010 CSA2207
Object Tracking can be considered as a correspondence problem
Matching objects tracked in previous video frames (f0 .. ft-1) to the blobs of video frame ft.
In tracking, the blobs of frame ft are sometimes referred to as “observations”
Not always a 1-1 correspondence Objects may become occluded or partially occluded
E.g. a person walking behind a car
Objects may appear to fragment E.g. due to misdetection of part of an object
(camouflage)
Objects may appear to merge together
E.g. caused by shadow
Objects may physically combine into one
E.g. a person enters a slowly-moving car
An object may physically split into multiple ones E.g. a person leaves a bag behind
Object Tracking II
Mark Borg 28 March 2010 CSA2207
Object Tracking can be considered as a correspondence problem
Matching objects tracked in previous video frames (f0 .. ft-1) to the blobs of video frame ft.
In tracking, the blobs of frame ft are sometimes referred to as “observations”
Not always a 1-1 correspondence Objects may become occluded or partially occluded
E.g. a person walking behind a car
Objects may appear to fragment E.g. due to misdetection of part of an object
(camouflage)
Objects may appear to merge together
E.g. caused by shadow
Objects may physically combine into one
E.g. a person enters a slowly-moving car
An object may physically split into multiple ones E.g. a person leaves a bag behind
14/04/2010
39
Object Tracking II
Mark Borg 28 March 2010 CSA2207
Object Tracking can be considered as a correspondence problem
Matching objects tracked in previous video frames (f0 .. ft-1) to the blobs of video frame ft.
In tracking, the blobs of frame ft are sometimes referred to as “observations”
Not always a 1-1 correspondence Objects may become occluded or partially occluded
E.g. a person walking behind a car
Objects may appear to fragment E.g. due to misdetection of part of an object
(camouflage)
Objects may appear to merge together
E.g. caused by shadow
Objects may physically combine into one
E.g. a person enters a slowly-moving car
An object may physically split into multiple ones E.g. a person leaves a bag behind
Object Tracking II
Mark Borg 28 March 2010 CSA2207
Object Tracking can be considered as a correspondence problem
Matching objects tracked in previous video frames (f0 .. ft-1) to the blobs of video frame ft.
In tracking, the blobs of frame ft are sometimes referred to as “observations”
Not always a 1-1 correspondence Objects may become occluded or partially occluded
E.g. a person walking behind a car
Objects may appear to fragment E.g. due to misdetection of part of an object
(camouflage)
Objects may appear to merge together
E.g. caused by shadow
Objects may physically combine into one
E.g. a person enters a slowly-moving car
An object may physically split into multiple ones E.g. a person leaves a bag behind
14/04/2010
40
Tracked Object Matching Criteria I
Mark Borg 28 March 2010 CSA2207
Can use:
Spatial proximity
E.g. Nearest object, object intersection, bounding box overlap, etc.
Disadvantage: very simple; usually not enough on its own
Area and Size
Disadvantage: very simple; usually not enough on its own
?
Tracked Object Matching Criteria II
Mark Borg 28 March 2010 CSA2207
Can use:
Colour information
E.g. Dominant colour(s), histogram, etc.
Advantage: robust to partial occlusion
Disadvantage: might fail when objects move into shadowed areas, rotate showing different
coloured/textured sides
14/04/2010
41
Tracked Object Matching Criteria III
Mark Borg 28 March 2010 CSA2207
Can use:
Shape similarity
E.g. Object shape features (like eccentricity, skew, circularity, etc.), boundary, skeleton, snakes
(active contours), etc.
Disadvantage: with the exception of snakes, might fail in case of non-rigid objects (e.g. People)
Tracked Object Matching Criteria IV
Mark Borg 28 March 2010 CSA2207
Can use:
Motion information
E.g. Speed, trajectory prediction (Kalman filter), etc.
Disadvantage: might fail on non-rigid objects or (for some of the methods) in case of non-affine
motion
14/04/2010
42
Tracked Object Matching Criteria V
Mark Borg 28 March 2010 CSA2207
Can use:
Local Features
E.g. Corners (KLT features), SIFT (Scale Invariant Feature Transform), etc.
Disadvantage: tracking result depends on quality of features selected; fail on objects containing
little texture
Tracked Object Matching Criteria VI
Mark Borg 28 March 2010 CSA2207
Can use:
Template Matching
E.g. Region correlation, etc.
Disadvantage: might fail on non-affine motion; can be very sensitive to brightness and size
variation
Optical Flow
Using the apparent motion of brightness patches in an image
Disadvantage: computationally expensive; require objects to be continually moving
Many others...
14/04/2010
43
Match Score Function
Mark Borg 28 March 2010 CSA2207
Matching of objects {O} to blobs {B} is normally performed through some
set of weighted matching criteria {ƒk}, giving a match score matrix:
),(...),(),(),score( 2211 jikkjijiji BOfwBOfwBOfwBO
0.1...where 21 kwww
MjBB
NiOO
j
i
..1 where}{and
..1 where}{and
Object Tracking Algorithm
Mark Borg 28 March 2010 CSA2207
Algorithm:
For each tracked object O, and blob (observation) B detected in video frame ft:
Compute the match score value and store result in a score matrix
Using the score matrix:
Try match the blob to one or more tracked objects
If a unique is match found:
Update the tracked object with the new blob information
If a non-unique match found:
Handle case of merged objects (e.g., best match or matching to parts of blob)
Handle case of object splitting into multiple blobs
If no match found:
Is this a potentially new tracked object?
If yes, initialise a new tracked object with the given blob
For any tracked objects not matched to blobs in frame ft:
Mark these as „lost‟ in current frame.
If a tracked object has been „lost‟ for a long time, delete it.
14/04/2010
44
Tracked Object State
Mark Borg 28 March 2010 CSA2207
Object properties:
Unique ID
Flag indicating if object is visible in current video frame
Age (i.e. for how many video frames it has been tracked)
When last seen (video frame #)
Flags indicating if currently in merged state, has fragmented, etc.
Trajectory
List of image positions of the object in the previous video frames
Some form of smoothing may be required
E.g., window-based averaging
Can be used for estimating future position (e.g. Kalman filter)
Object descriptors depending on the chosen tracking method and criteria For example:
Bounding box,
Object centre,
Size,
Colour information, local features, etc.
Tracking example I
Mark Borg 28 March 2010 CSA2207
Trajectory
Unique ID Red outline is
bounding box of
motion detection
blob; white boundary
represents the
tracked object
14/04/2010
45
Tracking example II
Mark Borg 28 March 2010 CSA2207
Trajectory smoothing...
Tracking example III
Mark Borg 28 March 2010 CSA2207
Special handling of merged objects...
ID of object 28
correctly
maintained during
partial occlusion
14/04/2010
46
Tracking example IV
Mark Borg 28 March 2010 CSA2207
Else, loss of object identity without special handling of merged objects:
Object 30 changes
ID to 32 Object 30 is lost
Tracking Demos
Mark Borg 28 March 2010 CSA2207
Let‟s look at some motion detection and object tracking
videos…
14/04/2010
47
Application of Object Tracking
Mark Borg 28 March 2010 CSA2207
We will look at one application of interest to Computer
Graphics:
Motion Capture
Recording the movements of actors
Mapping the movements on to a digital model
Motion Capture I
Mark Borg 28 March 2010 CSA2207
Non-CV motion capture
Electro-mechanical exoskeleton systems
Electro-magnetic
Inertial systems
Advantages:
Can be quite precise
No occlusion problems
Simpler data processing
Disadvantages:
Restrictive motion
Difficult to capture
facial expressions
Require manual
configuration
Source: Animazoo
14/04/2010
48
Motion Capture II
Mark Borg 28 March 2010 CSA2207
Optical motion capture systems
Require a multi-camera system
Use Object Tracking algorithms
3D position of markers obtained via stereo vision
Can suffer from problems like occlusion, tracking
errors, identity swap, etc.
Can be classified into:
Optical marker-based approaches
Most commonly used
Optical marker-less systems
Still an active research area
Though a commercial system already
available (Organic Motion)
Marker-based Motion
Capture I
Mark Borg 28 March 2010 CSA2207
Multi-camera system
Actor with reflective
markers
14/04/2010
49
Marker-based Motion
Capture II
Mark Borg 28 March 2010 CSA2207
Allows for facial expression capture
Source: Sony Pictures ImageWorks
Source: Mocaplab
Marker-based Motion Capture III
Mark Borg 28 March 2010 CSA2207
3D Marker trajectories / motion curves captured through object tracking
Transfer of motion to 3D models
14/04/2010
50
Marker-based Motion Capture IV
Mark Borg 28 March 2010 CSA2207
Some Demos
Markerless Motion Capture
Mark Borg 28 March 2010 CSA2207
Body part and shape tracking
Silhouette-based tracking
Pose reconstruction via model fitting
May require physics-based post-processing for tracking error correction
Demo of system by D. Vlasic et al., MIT, 2008.