computer vision for computer graphics part 2 - faculty of …sspi3/computervision-pt2.pdf ·...

51
14/04/2010 1 Computer Vision for Computer Graphics – part 2 Mark Borg Outline Mark Borg 28 March 2010 CSA2207 We will be looking at the following CV areas: Stereovision Recovering depth information Stereo correspondence problem Multi-view imaging and the Plenoptic function Applications to CG: 3D Model Acquisition View Morphing, “bullet time” effect Automated Visual Surveillance Motion Detection Background Subtraction techniques Object Tracking Applications to CG: Motion Capture Basis for Behaviour Recognition in HCI interfaces, Project Natal

Upload: vandat

Post on 30-Jun-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

14/04/2010

1

Computer Vision

for Computer Graphics – part 2

Mark Borg

Outline

Mark Borg 28 March 2010 CSA2207

We will be looking at the following CV areas:

Stereovision

Recovering depth information

Stereo correspondence problem

Multi-view imaging and the Plenoptic function

Applications to CG:

3D Model Acquisition

View Morphing, “bullet time” effect

Automated Visual Surveillance

Motion Detection

Background Subtraction techniques

Object Tracking

Applications to CG:

Motion Capture

Basis for Behaviour Recognition in HCI interfaces, Project Natal

14/04/2010

2

Automated Visual (Video) Surveillance (AVS)

Part of Computer Vision

Too many cameras, too few operators

Applications: Security

E.g. Unauthorised access, perimeter surveillance.

Safety E.g. Tunnel fire & accident detection

Improving efficiency E.g. Traffic congestion & flow control

Behaviour analysis/recognition E.g. Detecting loitering, path analysis

Situational awareness E.g. Wide area sensor network

Video mining E.g. Querying CCTV video, crime scene

analysis

Mark Borg 28 March 2010 CSA2207

Automated Visual (Video) Surveillance (AVS)

Part of Computer Vision

Too many cameras, too few operators

Applications: Security

E.g. Unauthorised access, perimeter surveillance.

Safety E.g. Tunnel fire & accident detection

Improving efficiency E.g. Traffic congestion & flow control

Behaviour analysis/recognition E.g. Detecting loitering, path analysis

Situational awareness E.g. Wide area sensor network

Video mining E.g. Querying CCTV video, crime scene

analysis

Mark Borg 28 March 2010 CSA2207

14/04/2010

3

Automated Visual (Video) Surveillance (AVS)

Part of Computer Vision

Too many cameras, too few operators

Applications: Security

E.g. Unauthorised access, perimeter surveillance.

Safety E.g. Tunnel fire & accident detection

Improving efficiency E.g. Traffic congestion & flow control

Behaviour analysis/recognition E.g. Detecting loitering, path analysis

Situational awareness E.g. Wide area sensor network

Video mining E.g. Querying CCTV video, crime scene

analysis

Mark Borg 28 March 2010 CSA2207

Automated Visual (Video) Surveillance (AVS)

Part of Computer Vision

Too many cameras, too few operators

Applications: Security

E.g. Unauthorised access, perimeter surveillance.

Safety E.g. Tunnel fire & accident detection

Improving efficiency E.g. Traffic congestion & flow control

Behaviour analysis/recognition E.g. Detecting loitering, path analysis

Situational awareness E.g. Wide area sensor network

Video mining E.g. Querying CCTV video, crime scene

analysis

Mark Borg 28 March 2010 CSA2207

14/04/2010

4

Automated Visual (Video) Surveillance (AVS)

Part of Computer Vision

Too many cameras, too few operators

Applications: Security

E.g. Unauthorised access, perimeter surveillance.

Safety E.g. Tunnel fire & accident detection

Improving efficiency E.g. Traffic congestion & flow control

Behaviour analysis/recognition E.g. Detecting loitering, path analysis

Situational awareness E.g. Wide area sensor network

Video mining E.g. Querying CCTV video, crime scene

analysis

Mark Borg 28 March 2010 CSA2207

Automated Visual (Video) Surveillance (AVS)

Part of Computer Vision

Too many cameras, too few operators

Applications: Security

E.g. Unauthorised access, perimeter surveillance.

Safety E.g. Tunnel fire & accident detection

Improving efficiency E.g. Traffic congestion & flow control

Behaviour analysis/recognition E.g. Detecting loitering, path analysis

Situational awareness E.g. Wide area sensor network

Video mining E.g. Querying CCTV video, crime scene

analysis

Mark Borg 28 March 2010 CSA2207

14/04/2010

5

Automated Visual (Video) Surveillance (AVS)

Part of Computer Vision

Too many cameras, too few operators

Applications: Security

E.g. Unauthorised access, perimeter surveillance.

Safety E.g. Tunnel fire & accident detection

Improving efficiency E.g. Traffic congestion & flow control

Behaviour analysis/recognition E.g. Detecting loitering, path analysis

Situational awareness E.g. Wide area sensor network

Video mining E.g. Querying CCTV video, crime scene

analysis

Mark Borg 28 March 2010 CSA2207

AVS Characteristics I

Can be Single or Multi-Camera systems

Multi-Camera systems:

Overlapping FOVs (Fields-of-view) vs. Disjoint FOVs

Camera Hand-Over problem

Mark Borg 28 March 2010 CSA2207

14/04/2010

6

AVS Characteristics II

Mark Borg 28 March 2010 CSA2207

Top-Down vs. Bottom-Up strategies

Top-Down approach

Hypothesis driven, Model-based

Using prior information about the scene, objects and their dynamics, to generate and verify hypotheses against the video data

Computationally very complex

Hard to achieve real-time speeds

Tend to be highly domain specific

Source: AVITRACK

project

AVS Characteristics III

Mark Borg 28 March 2010 CSA2207

Bottom-Up approach

Data driven

Analysing the video data to extract salient information

Less domain specific

Computational complexity lower than for TD approaches

Hybrid systems

14/04/2010

7

Real-time Requirements

Mark Borg 28 March 2010 CSA2207

Must run in real-time

Generally defined to be at 25 FPS (frames per second)

Computationally intensive

Massive amounts of video data are processed

Typical video input rate for 1 camera: CCTV 4CIF PAL frame size,

colour,

at 25fps

Great potential for Parallelisation

704 x 576 x 3 x 25 = 29.1 MB/s

Typical modules of an AVS system

Mark Borg 28 March 2010 CSA2207

Offline

Main

modules

KEY

Video data

Semantic

information

Object

Tracking

Event Detection/

Alarm Generation

Behaviour

Analysis /

Recognition

Human

Computer Interface

Motion

Detection

Camera

Calibration

Scene

Modelling Video Capture

& Encoding

Object

Categorisation

/ Recognition

3D Localisation

Multi-camera

Data Fusion

...

14/04/2010

8

Example: The AVITRACK project

Mark Borg 28 March 2010 CSA2207

Example: The AVITRACK project

Mark Borg 28 March 2010 CSA2207

Airport apron surveillance

Monitoring activity around

aeroplane at gate during

servicing

8-camera system, overlapping FOVs, fibre-optic network for video transmission

Project Aim: Improved security and more efficient turn-arounds

14/04/2010

9

Example: The AVITRACK project

Mark Borg 28 March 2010 CSA2207

4 dual-core server blades (1 CPU per camera)

Motion detection runs independently for each camera

Example: The AVITRACK project

Mark Borg 28 March 2010 CSA2207

Object tracking runs independently for each camera

Tracking uses colour, motion information and object features

14/04/2010

10

Example: The AVITRACK project

Mark Borg 28 March 2010 CSA2207

Cameras are calibrated

3D position of tracked objects is estimated

from image position and camera geometry

Example: The AVITRACK project

Mark Borg 28 March 2010 CSA2207

Object recognition using top-down model fitting

3D model fitting is not real-time, thus it runs on a separate thread.

Communication between the tracker and 3D model fitting is via a queue

(requests expiring if not serviced in a certain time window).

14/04/2010

11

Example: The AVITRACK project

Mark Borg 28 March 2010 CSA2207

Observations from all 8 cameras combined together

Improves tracking results and better 3D positions

A server dedicated to the data fusion process

Example: The AVITRACK project

Mark Borg 28 March 2010 CSA2207

Scene Understanding module performing behaviour recognition using

temporal and spatial logic

A dedicated server for behaviour recognition and GUI

14/04/2010

12

Example: The AVITRACK project

Mark Borg 28 March 2010 CSA2207

Scene Understanding module performing behaviour recognition using

temporal and spatial logic

A dedicated server for behaviour recognition and GUI

Motion Detection

Mark Borg 28 March 2010 CSA2207

Frame Difference Also called Frame-to-Frame Difference

Background Subtraction

Not as simple as FD

Need background

14/04/2010

13

Digital Image

Mark Borg 20 Nov 2009 CSA2207

Frame Difference

Mark Borg 28 March 2010 CSA2207

Working with greyscale images:

Given 2 frames ft, ft+1:

Given some threshold K:

BGRV 3

1

),(),(),( 1 yxfyxfABSyxd ttt

otherwise

Kyxdifyxm

t

t0

),(1),(

14/04/2010

14

Frame Difference II

Mark Borg 28 March 2010 CSA2207

Background Subtraction

Mark Borg 28 March 2010 CSA2207

14/04/2010

15

Background Model

Mark Borg 28 March 2010 CSA2207

Single video frame of empty scene?

Image data subject to noise

Background Model

Mark Borg 28 March 2010 CSA2207

Single video frame of empty scene?

Image data subject to noise

Analysing pixel value variability over

several video frames... Pixel (100,100) over 200 background frames...

14/04/2010

16

Background Model

Mark Borg 28 March 2010 CSA2207

Single video frame of empty scene?

Image data subject to noise

Analysing pixel value variability over

several video frames... Pixel (100,100) over 200 background frames...

How can we model this?

Background Model

Mark Borg 28 March 2010 CSA2207

Single video frame of empty scene?

Image data subject to noise

Analysing pixel value variability over

several video frames... Pixel (100,100) over 200 background frames...

How can we model this? As a Gaussian (Normal)

Distribution

14/04/2010

17

Gaussian (Normal) Distribution

Mark Borg 28 March 2010 CSA2207

The Gaussian is described by 2 parameters:

Mean

Standard deviation

2

2

2

2

1)(

x

exf

Gaussian (Normal) Distribution II

Mark Borg 28 March 2010 CSA2207

Up to 1 standard deviation from the mean accounts for 68.27% of all values

Up to 2 standard deviations from the mean accounts for 95.45%

Up to 3 standard deviations from the mean accounts for 99.73%

Hence, the 3-sigma rule:

Nearly all values lie within 3 standard deviations of the mean.

3

14/04/2010

18

Background Model II

Mark Borg 28 March 2010 CSA2207

Modelling the background pixel with a Gaussian:

If the value of this pixel in any video frame does not obey

the 3-sigma rule: , i.e., is outside the range:

then we can consider it as not explained by our

background model, and hence probably due to motion.

0573.1,6.106

3

7.109,4.1030573.136.106

Background Model Learning

Mark Borg 28 March 2010 CSA2207

Normally an offline process

In the test video sequence supplied, there are 200 video

frames (8 seconds) available for background learning

Gaussian parameters calculated as follows:

where N is number of background frames

14/04/2010

19

Background Model Learning

Mark Borg 28 March 2010 CSA2207

Normally an offline process

In the test video sequence supplied, there are 200 video

frames (8 seconds) available for background learning

Gaussian parameters calculated as follows:

where N is number of background frames

While iterating through the

background frames, need to keep only

two values:

• sum of pixel values, and

• sum of pixel values squared

While iterating through the

background frames, need to keep only

two values:

• sum of pixel values, and

• sum of pixel values squared

Background Subtraction II

Mark Borg 28 March 2010 CSA2207

Given a frame at time t and background model :

Background subtraction is highly parallelisable

,tf

14/04/2010

20

Motion Detection results

Mark Borg 28 March 2010 CSA2207

Background subtraction running on the supplied test video:

Motion Detection results II

Mark Borg 28 March 2010 CSA2207

Some observations:

Issue of shadow

Noise in motion result (false positives)

These are mostly „isolated‟ pixels, i.e., not supported by their neighbours!

14/04/2010

21

Eliminating noise in motion bitmap

Mark Borg 28 March 2010 CSA2207

Motion pixels not supported by their neighbours are

most probably false

For each foreground (motion) pixel:

Count the number of neighbouring pixels that are also foreground

If count < K, then set the pixel to background (i.e. not a motion pixel)

Note: K is typically set to 3

Motion Detection results II

Mark Borg 28 March 2010 CSA2207

After eliminating isolated foreground pixels:

Result has improved, though there are still small groups of false positives...

...these will be eliminated later when we do blob size filtering.

14/04/2010

22

Background Model Update

Mark Borg 28 March 2010 CSA2207

The background model can get out of date!

Variation in illumination levels

Sudden changes (e.g., clouds, specular reflections)

Gradual changes (e.g., light levels during the day)

Objects can become stationary (e.g., a car is parked)

Background model needs updating

Running average

Typically alpha set to something like 0.05

From foreground pixels to blobs

Mark Borg 28 March 2010 CSA2207

The motion detection process treats each pixel

independently of the others

We must now group the foreground pixels together

to form „blobs‟

Hopefully a blob will correspond to an object

in the real world

Not always a 1-1 correspondence

Object fragmentation E.g., caused by misdetection

Object merging

E.g., caused by occlusion

The algorithm used here for grouping foreground

pixels is called Connected Component Labelling

14/04/2010

23

Connected Component Labelling

Mark Borg 28 March 2010 CSA2207

Algorithm

Input: motion bitmap

Output: a set of labelled connected components (blobs)

A 2-pass algorithm, i.e., the image data is traversed twice

Using 8-connectivity in this case

Connected Component Labelling II

Mark Borg 28 March 2010 CSA2207

Algorithm On 1st pass:

1. Iterate through each pixel:

2. If the pixel is foreground:

1. Get the foreground neighbours of the pixel

2. If none found, uniquely label the current pixel and

continue

3. Otherwise, find the neighbour with the smallest

label and assign it to the current pixel

4. If more than one label, store the equivalence

between neighbouring labels

On 2nd pass:

1. Iterate through each pixel:

2. If the pixel is foreground:

1. Re-label the pixel with the lowest equivalent label

14/04/2010

24

Connected Component Labelling II

Mark Borg 28 March 2010 CSA2207

Algorithm On 1st pass:

1. Iterate through each pixel:

2. If the pixel is foreground:

1. Get the foreground neighbours of the pixel

2. If none found, uniquely label the current pixel and

continue

3. Otherwise, find the neighbour with the smallest

label and assign it to the current pixel

4. If more than one label, store the equivalence

between neighbouring labels

On 2nd pass:

1. Iterate through each pixel:

2. If the pixel is foreground:

1. Re-label the pixel with the lowest equivalent label

Connected Component Labelling II

Mark Borg 28 March 2010 CSA2207

Algorithm On 1st pass:

1. Iterate through each pixel:

2. If the pixel is foreground:

1. Get the foreground neighbours of the pixel

2. If none found, uniquely label the current pixel and

continue

3. Otherwise, find the neighbour with the smallest

label and assign it to the current pixel

4. If more than one label, store the equivalence

between neighbouring labels

On 2nd pass:

1. Iterate through each pixel:

2. If the pixel is foreground:

1. Re-label the pixel with the lowest equivalent label

14/04/2010

25

Connected Component Labelling II

Mark Borg 28 March 2010 CSA2207

Algorithm On 1st pass:

1. Iterate through each pixel:

2. If the pixel is foreground:

1. Get the foreground neighbours of the pixel

2. If none found, uniquely label the current pixel and

continue

3. Otherwise, find the neighbour with the smallest

label and assign it to the current pixel

4. If more than one label, store the equivalence

between neighbouring labels

On 2nd pass:

1. Iterate through each pixel:

2. If the pixel is foreground:

1. Re-label the pixel with the lowest equivalent label

Connected Component Labelling II

Mark Borg 28 March 2010 CSA2207

Algorithm On 1st pass:

1. Iterate through each pixel:

2. If the pixel is foreground:

1. Get the foreground neighbours of the pixel

2. If none found, uniquely label the current pixel and

continue

3. Otherwise, find the neighbour with the smallest

label and assign it to the current pixel

4. If more than one label, store the equivalence

between neighbouring labels

On 2nd pass:

1. Iterate through each pixel:

2. If the pixel is foreground:

1. Re-label the pixel with the lowest equivalent label

14/04/2010

26

Connected Component Labelling II

Mark Borg 28 March 2010 CSA2207

Algorithm On 1st pass:

1. Iterate through each pixel:

2. If the pixel is foreground:

1. Get the foreground neighbours of the pixel

2. If none found, uniquely label the current pixel and

continue

3. Otherwise, find the neighbour with the smallest

label and assign it to the current pixel

4. If more than one label, store the equivalence

between neighbouring labels

On 2nd pass:

1. Iterate through each pixel:

2. If the pixel is foreground:

1. Re-label the pixel with the lowest equivalent label

Connected Component Labelling II

Mark Borg 28 March 2010 CSA2207

Algorithm On 1st pass:

1. Iterate through each pixel:

2. If the pixel is foreground:

1. Get the foreground neighbours of the pixel

2. If none found, uniquely label the current pixel and

continue

3. Otherwise, find the neighbour with the smallest

label and assign it to the current pixel

4. If more than one label, store the equivalence

between neighbouring labels

On 2nd pass:

1. Iterate through each pixel:

2. If the pixel is foreground:

1. Re-label the pixel with the lowest equivalent label

14/04/2010

27

Connected Component Labelling II

Mark Borg 28 March 2010 CSA2207

Algorithm On 1st pass:

1. Iterate through each pixel:

2. If the pixel is foreground:

1. Get the foreground neighbours of the pixel

2. If none found, uniquely label the current pixel and

continue

3. Otherwise, find the neighbour with the smallest

label and assign it to the current pixel

4. If more than one label, store the equivalence

between neighbouring labels

On 2nd pass:

1. Iterate through each pixel:

2. If the pixel is foreground:

1. Re-label the pixel with the lowest equivalent label

Connected Component Labelling III

Mark Borg 28 March 2010 CSA2207

Algorithm On 1st pass:

1. Iterate through each pixel:

2. If the pixel is foreground:

1. Get the foreground neighbours of the pixel

2. If none found, uniquely label the current pixel and

continue

3. Otherwise, find the neighbour with the smallest

label and assign it to the current pixel

4. If more than one label, store the equivalence

between neighbouring labels

On 2nd pass:

1. Iterate through each pixel:

2. If the pixel is foreground:

1. Re-label the pixel with the lowest equivalent label

14/04/2010

28

Connected Component Labelling III

Mark Borg 28 March 2010 CSA2207

Algorithm On 1st pass:

1. Iterate through each pixel:

2. If the pixel is foreground:

1. Get the foreground neighbours of the pixel

2. If none found, uniquely label the current pixel and

continue

3. Otherwise, find the neighbour with the smallest

label and assign it to the current pixel

4. If more than one label, store the equivalence

between neighbouring labels

On 2nd pass:

1. Iterate through each pixel:

2. If the pixel is foreground:

1. Re-label the pixel with the lowest equivalent label

Connected Component Labelling III

Mark Borg 28 March 2010 CSA2207

Algorithm On 1st pass:

1. Iterate through each pixel:

2. If the pixel is foreground:

1. Get the foreground neighbours of the pixel

2. If none found, uniquely label the current pixel and

continue

3. Otherwise, find the neighbour with the smallest

label and assign it to the current pixel

4. If more than one label, store the equivalence

between neighbouring labels

On 2nd pass:

1. Iterate through each pixel:

2. If the pixel is foreground:

1. Re-label the pixel with the lowest equivalent label

14/04/2010

29

Connected Component Labelling III

Mark Borg 28 March 2010 CSA2207

Algorithm On 1st pass:

1. Iterate through each pixel:

2. If the pixel is foreground:

1. Get the foreground neighbours of the pixel

2. If none found, uniquely label the current pixel and

continue

3. Otherwise, find the neighbour with the smallest

label and assign it to the current pixel

4. If more than one label, store the equivalence

between neighbouring labels

On 2nd pass:

1. Iterate through each pixel:

2. If the pixel is foreground:

1. Re-label the pixel with the lowest equivalent label

Connected Component Labelling III

Mark Borg 28 March 2010 CSA2207

Algorithm On 1st pass:

1. Iterate through each pixel:

2. If the pixel is foreground:

1. Get the foreground neighbours of the pixel

2. If none found, uniquely label the current pixel and

continue

3. Otherwise, find the neighbour with the smallest

label and assign it to the current pixel

4. If more than one label, store the equivalence

between neighbouring labels

On 2nd pass:

1. Iterate through each pixel:

2. If the pixel is foreground:

1. Re-label the pixel with the lowest equivalent label

14/04/2010

30

Connected Component Labelling III

Mark Borg 28 March 2010 CSA2207

Algorithm On 1st pass:

1. Iterate through each pixel:

2. If the pixel is foreground:

1. Get the foreground neighbours of the pixel

2. If none found, uniquely label the current pixel and

continue

3. Otherwise, find the neighbour with the smallest

label and assign it to the current pixel

4. If more than one label, store the equivalence

between neighbouring labels

On 2nd pass:

1. Iterate through each pixel:

2. If the pixel is foreground:

1. Re-label the pixel with the lowest equivalent label

Connected Component Labelling III

Mark Borg 28 March 2010 CSA2207

Algorithm On 1st pass:

1. Iterate through each pixel:

2. If the pixel is foreground:

1. Get the foreground neighbours of the pixel

2. If none found, uniquely label the current pixel and

continue

3. Otherwise, find the neighbour with the smallest

label and assign it to the current pixel

4. If more than one label, store the equivalence

between neighbouring labels

On 2nd pass:

1. Iterate through each pixel:

2. If the pixel is foreground:

1. Re-label the pixel with the lowest equivalent label

14/04/2010

31

Connected Component Labelling III

Mark Borg 28 March 2010 CSA2207

Algorithm On 1st pass:

1. Iterate through each pixel:

2. If the pixel is foreground:

1. Get the foreground neighbours of the pixel

2. If none found, uniquely label the current pixel and

continue

3. Otherwise, find the neighbour with the smallest

label and assign it to the current pixel

4. If more than one label, store the equivalence

between neighbouring labels

On 2nd pass:

1. Iterate through each pixel:

2. If the pixel is foreground:

1. Re-label the pixel with the lowest equivalent label

Connected Component Labelling III

Mark Borg 28 March 2010 CSA2207

Algorithm On 1st pass:

1. Iterate through each pixel:

2. If the pixel is foreground:

1. Get the foreground neighbours of the pixel

2. If none found, uniquely label the current pixel and

continue

3. Otherwise, find the neighbour with the smallest

label and assign it to the current pixel

4. If more than one label, store the equivalence

between neighbouring labels

On 2nd pass:

1. Iterate through each pixel:

2. If the pixel is foreground:

1. Re-label the pixel with the lowest equivalent label

14/04/2010

32

Connected Component Labelling III

Mark Borg 28 March 2010 CSA2207

Algorithm On 1st pass:

1. Iterate through each pixel:

2. If the pixel is foreground:

1. Get the foreground neighbours of the pixel

2. If none found, uniquely label the current pixel and

continue

3. Otherwise, find the neighbour with the smallest

label and assign it to the current pixel

4. If more than one label, store the equivalence

between neighbouring labels

On 2nd pass:

1. Iterate through each pixel:

2. If the pixel is foreground:

1. Re-label the pixel with the lowest equivalent label

Blob Descriptors

Mark Borg 28 March 2010 CSA2207

The following is the blob

information that needs to be

extracted:

Blob area/size, i.e., the number of pixels

Blob bounding box

This information can be collected during

Connected Component Labelling to avoid

performing multiple passes over the image

14/04/2010

33

Blob extraction results

Mark Borg 28 March 2010 CSA2207

After performing connected component labelling:

Note the small spurious blobs caused by noise in the motion detector

Will be solved through blob size filtering

Blob Size Filtering

Mark Borg 28 March 2010 CSA2207

Small blobs caused by noise in the motion detector

process

Solution:

Filter blobs by their area

For each blob B:

If their area < minSize:

Delete blob B

Note: using minSize = 50 is a good threshold for the video sequence used

in this assignment

14/04/2010

34

Blob Size Filtering results

Mark Borg 28 March 2010 CSA2207

After performing blob size filtering:

Number of initial blobs found: 92 Number of blobs after merging: 7

Final result

Mark Borg 28 March 2010 CSA2207

These 3 marbles appear merged into 1

blob because of partial occlusion and due

to the merging of shadows.

Our motion detector has no shadow

suppression component (It is beyond the

scope of this assignment).

14/04/2010

35

Short note on Shadow Suppression

Mark Borg 28 March 2010 CSA2207

Shadow detection and suppression Based on the observation that where a shadow falls, the background gets darker,

but its colour remains unchanged

Must use a colour space that separates colour into luminance and chrominance components RGB can‟t be used

HSV (Hue, Saturation, Value) Converting from RGB to HSV is expensive

YCbCr (Luminance, Blue-difference, Red-difference)

Luminance channel: Y, Chrominance channels: Cb, Cr

Short note on Shadow Suppression II

Mark Borg 28 March 2010 CSA2207

Using method by Horprasert et al. “Robust Background Subtraction and Shadow Detection”, ACCV, 2000

Motion Detection performed on channels Y, Cb, Cr, i.e., 3

Gaussians per pixel

Classify image pixels as:

• Background if Y, Cb, Cr are all within the range μ±3σ

• Shaded Background if Y < μ-3σ, and Cb, Cr within the range μ±3σ

• Highlighted Background if Y > μ+3σ, and Cb, Cr within the range μ±3σ

• Foreground in all other cases where Cb and/or Cr are not

within the range μ±3σ

14/04/2010

36

Short note on Shadow Suppression III

Mark Borg 28 March 2010 CSA2207

Shadow

Highlight

Foreground

Background

w/o shadow removal with shadow removal

Object Tracking

Mark Borg 28 March 2010 CSA2207

So far each video frame has been processed independently

Result at the end of each video frame, is a list of blobs

We need to keep track of the detected blobs across video

frames (across time)

Maintaining the identity of objects throughout the video sequence

time t time t+n

14/04/2010

37

Object Tracking

Mark Borg 28 March 2010 CSA2207

So far each video frame has been processed independently

Result at the end of each video frame, is a list of blobs

We need to keep track of the detected blobs across video

frames (across time)

Maintaining the identity of objects throughout the video sequence

time t time t+n

Object Tracking II

Mark Borg 28 March 2010 CSA2207

Object Tracking can be considered as a correspondence problem

Matching objects tracked in previous video frames (f0 .. ft-1) to the blobs of video frame ft.

In tracking, the blobs of frame ft are sometimes referred to as “observations”

Not always a 1-1 correspondence Objects may become occluded or partially occluded

E.g. a person walking behind a car

Objects may appear to fragment E.g. due to misdetection of part of an object

(camouflage)

Objects may appear to merge together

E.g. caused by shadow

Objects may physically combine into one

E.g. a person enters a slowly-moving car

An object may physically split into multiple ones E.g. a person leaves a bag behind

14/04/2010

38

Object Tracking II

Mark Borg 28 March 2010 CSA2207

Object Tracking can be considered as a correspondence problem

Matching objects tracked in previous video frames (f0 .. ft-1) to the blobs of video frame ft.

In tracking, the blobs of frame ft are sometimes referred to as “observations”

Not always a 1-1 correspondence Objects may become occluded or partially occluded

E.g. a person walking behind a car

Objects may appear to fragment E.g. due to misdetection of part of an object

(camouflage)

Objects may appear to merge together

E.g. caused by shadow

Objects may physically combine into one

E.g. a person enters a slowly-moving car

An object may physically split into multiple ones E.g. a person leaves a bag behind

Object Tracking II

Mark Borg 28 March 2010 CSA2207

Object Tracking can be considered as a correspondence problem

Matching objects tracked in previous video frames (f0 .. ft-1) to the blobs of video frame ft.

In tracking, the blobs of frame ft are sometimes referred to as “observations”

Not always a 1-1 correspondence Objects may become occluded or partially occluded

E.g. a person walking behind a car

Objects may appear to fragment E.g. due to misdetection of part of an object

(camouflage)

Objects may appear to merge together

E.g. caused by shadow

Objects may physically combine into one

E.g. a person enters a slowly-moving car

An object may physically split into multiple ones E.g. a person leaves a bag behind

14/04/2010

39

Object Tracking II

Mark Borg 28 March 2010 CSA2207

Object Tracking can be considered as a correspondence problem

Matching objects tracked in previous video frames (f0 .. ft-1) to the blobs of video frame ft.

In tracking, the blobs of frame ft are sometimes referred to as “observations”

Not always a 1-1 correspondence Objects may become occluded or partially occluded

E.g. a person walking behind a car

Objects may appear to fragment E.g. due to misdetection of part of an object

(camouflage)

Objects may appear to merge together

E.g. caused by shadow

Objects may physically combine into one

E.g. a person enters a slowly-moving car

An object may physically split into multiple ones E.g. a person leaves a bag behind

Object Tracking II

Mark Borg 28 March 2010 CSA2207

Object Tracking can be considered as a correspondence problem

Matching objects tracked in previous video frames (f0 .. ft-1) to the blobs of video frame ft.

In tracking, the blobs of frame ft are sometimes referred to as “observations”

Not always a 1-1 correspondence Objects may become occluded or partially occluded

E.g. a person walking behind a car

Objects may appear to fragment E.g. due to misdetection of part of an object

(camouflage)

Objects may appear to merge together

E.g. caused by shadow

Objects may physically combine into one

E.g. a person enters a slowly-moving car

An object may physically split into multiple ones E.g. a person leaves a bag behind

14/04/2010

40

Tracked Object Matching Criteria I

Mark Borg 28 March 2010 CSA2207

Can use:

Spatial proximity

E.g. Nearest object, object intersection, bounding box overlap, etc.

Disadvantage: very simple; usually not enough on its own

Area and Size

Disadvantage: very simple; usually not enough on its own

?

Tracked Object Matching Criteria II

Mark Borg 28 March 2010 CSA2207

Can use:

Colour information

E.g. Dominant colour(s), histogram, etc.

Advantage: robust to partial occlusion

Disadvantage: might fail when objects move into shadowed areas, rotate showing different

coloured/textured sides

14/04/2010

41

Tracked Object Matching Criteria III

Mark Borg 28 March 2010 CSA2207

Can use:

Shape similarity

E.g. Object shape features (like eccentricity, skew, circularity, etc.), boundary, skeleton, snakes

(active contours), etc.

Disadvantage: with the exception of snakes, might fail in case of non-rigid objects (e.g. People)

Tracked Object Matching Criteria IV

Mark Borg 28 March 2010 CSA2207

Can use:

Motion information

E.g. Speed, trajectory prediction (Kalman filter), etc.

Disadvantage: might fail on non-rigid objects or (for some of the methods) in case of non-affine

motion

14/04/2010

42

Tracked Object Matching Criteria V

Mark Borg 28 March 2010 CSA2207

Can use:

Local Features

E.g. Corners (KLT features), SIFT (Scale Invariant Feature Transform), etc.

Disadvantage: tracking result depends on quality of features selected; fail on objects containing

little texture

Tracked Object Matching Criteria VI

Mark Borg 28 March 2010 CSA2207

Can use:

Template Matching

E.g. Region correlation, etc.

Disadvantage: might fail on non-affine motion; can be very sensitive to brightness and size

variation

Optical Flow

Using the apparent motion of brightness patches in an image

Disadvantage: computationally expensive; require objects to be continually moving

Many others...

14/04/2010

43

Match Score Function

Mark Borg 28 March 2010 CSA2207

Matching of objects {O} to blobs {B} is normally performed through some

set of weighted matching criteria {ƒk}, giving a match score matrix:

),(...),(),(),score( 2211 jikkjijiji BOfwBOfwBOfwBO

0.1...where 21 kwww

MjBB

NiOO

j

i

..1 where}{and

..1 where}{and

Object Tracking Algorithm

Mark Borg 28 March 2010 CSA2207

Algorithm:

For each tracked object O, and blob (observation) B detected in video frame ft:

Compute the match score value and store result in a score matrix

Using the score matrix:

Try match the blob to one or more tracked objects

If a unique is match found:

Update the tracked object with the new blob information

If a non-unique match found:

Handle case of merged objects (e.g., best match or matching to parts of blob)

Handle case of object splitting into multiple blobs

If no match found:

Is this a potentially new tracked object?

If yes, initialise a new tracked object with the given blob

For any tracked objects not matched to blobs in frame ft:

Mark these as „lost‟ in current frame.

If a tracked object has been „lost‟ for a long time, delete it.

14/04/2010

44

Tracked Object State

Mark Borg 28 March 2010 CSA2207

Object properties:

Unique ID

Flag indicating if object is visible in current video frame

Age (i.e. for how many video frames it has been tracked)

When last seen (video frame #)

Flags indicating if currently in merged state, has fragmented, etc.

Trajectory

List of image positions of the object in the previous video frames

Some form of smoothing may be required

E.g., window-based averaging

Can be used for estimating future position (e.g. Kalman filter)

Object descriptors depending on the chosen tracking method and criteria For example:

Bounding box,

Object centre,

Size,

Colour information, local features, etc.

Tracking example I

Mark Borg 28 March 2010 CSA2207

Trajectory

Unique ID Red outline is

bounding box of

motion detection

blob; white boundary

represents the

tracked object

14/04/2010

45

Tracking example II

Mark Borg 28 March 2010 CSA2207

Trajectory smoothing...

Tracking example III

Mark Borg 28 March 2010 CSA2207

Special handling of merged objects...

ID of object 28

correctly

maintained during

partial occlusion

14/04/2010

46

Tracking example IV

Mark Borg 28 March 2010 CSA2207

Else, loss of object identity without special handling of merged objects:

Object 30 changes

ID to 32 Object 30 is lost

Tracking Demos

Mark Borg 28 March 2010 CSA2207

Let‟s look at some motion detection and object tracking

videos…

14/04/2010

47

Application of Object Tracking

Mark Borg 28 March 2010 CSA2207

We will look at one application of interest to Computer

Graphics:

Motion Capture

Recording the movements of actors

Mapping the movements on to a digital model

Motion Capture I

Mark Borg 28 March 2010 CSA2207

Non-CV motion capture

Electro-mechanical exoskeleton systems

Electro-magnetic

Inertial systems

Advantages:

Can be quite precise

No occlusion problems

Simpler data processing

Disadvantages:

Restrictive motion

Difficult to capture

facial expressions

Require manual

configuration

Source: Animazoo

14/04/2010

48

Motion Capture II

Mark Borg 28 March 2010 CSA2207

Optical motion capture systems

Require a multi-camera system

Use Object Tracking algorithms

3D position of markers obtained via stereo vision

Can suffer from problems like occlusion, tracking

errors, identity swap, etc.

Can be classified into:

Optical marker-based approaches

Most commonly used

Optical marker-less systems

Still an active research area

Though a commercial system already

available (Organic Motion)

Marker-based Motion

Capture I

Mark Borg 28 March 2010 CSA2207

Multi-camera system

Actor with reflective

markers

14/04/2010

49

Marker-based Motion

Capture II

Mark Borg 28 March 2010 CSA2207

Allows for facial expression capture

Source: Sony Pictures ImageWorks

Source: Mocaplab

Marker-based Motion Capture III

Mark Borg 28 March 2010 CSA2207

3D Marker trajectories / motion curves captured through object tracking

Transfer of motion to 3D models

14/04/2010

50

Marker-based Motion Capture IV

Mark Borg 28 March 2010 CSA2207

Some Demos

Markerless Motion Capture

Mark Borg 28 March 2010 CSA2207

Body part and shape tracking

Silhouette-based tracking

Pose reconstruction via model fitting

May require physics-based post-processing for tracking error correction

Demo of system by D. Vlasic et al., MIT, 2008.

14/04/2010

51

Thanks

Mark Borg 28 March 2010 CSA2207