computer vision for computer graphics part 2 - faculty of …sspi3/computervision-pt2.pdf ·...

14/04/2010

1

Computer Vision

for Computer Graphics – part 2

Mark Borg

Outline

Mark Borg 28 March 2010 CSA2207

We will be looking at the following CV areas:

Stereovision

Recovering depth information

Stereo correspondence problem

Multi-view imaging and the Plenoptic function

Applications to CG:

3D Model Acquisition

View Morphing, “bullet time” effect

Automated Visual Surveillance

Motion Detection

Background Subtraction techniques

Object Tracking

Applications to CG:

Motion Capture

Basis for Behaviour Recognition in HCI interfaces, Project Natal

14/04/2010

2

Automated Visual (Video) Surveillance (AVS)

Part of Computer Vision

Too many cameras, too few operators

Applications: Security

E.g. Unauthorised access, perimeter surveillance.

Safety E.g. Tunnel fire & accident detection

Improving efficiency E.g. Traffic congestion & flow control

Behaviour analysis/recognition E.g. Detecting loitering, path analysis

Situational awareness E.g. Wide area sensor network

Video mining E.g. Querying CCTV video, crime scene

analysis












analysis


14/04/2010

3











analysis












analysis


14/04/2010

4











analysis












analysis


14/04/2010

5











analysis


AVS Characteristics I

Can be Single or Multi-Camera systems

Multi-Camera systems:

Overlapping FOVs (Fields-of-view) vs. Disjoint FOVs

Camera Hand-Over problem


14/04/2010

6

AVS Characteristics II


Top-Down vs. Bottom-Up strategies

Top-Down approach

Hypothesis driven, Model-based

Using prior information about the scene, objects and their dynamics, to generate and verify hypotheses against the video data

Computationally very complex

Hard to achieve real-time speeds

Tend to be highly domain specific

Source: AVITRACK

project

AVS Characteristics III


Bottom-Up approach

Data driven

Analysing the video data to extract salient information

Less domain specific

Computational complexity lower than for TD approaches

Hybrid systems

14/04/2010

7

Real-time Requirements


Must run in real-time

Generally defined to be at 25 FPS (frames per second)

Computationally intensive

Massive amounts of video data are processed

Typical video input rate for 1 camera: CCTV 4CIF PAL frame size,

colour,

at 25fps

Great potential for Parallelisation

704 x 576 x 3 x 25 = 29.1 MB/s

Typical modules of an AVS system


Offline

Main

modules

KEY

Video data

Semantic

information

Object

Tracking

Event Detection/

Alarm Generation

Behaviour

Analysis /

Recognition

Human

Computer Interface

Motion

Detection

Camera

Calibration

Scene

Modelling Video Capture

& Encoding

Object

Categorisation

/ Recognition

3D Localisation

Multi-camera

Data Fusion

...

14/04/2010

8

Example: The AVITRACK project




Airport apron surveillance

Monitoring activity around

aeroplane at gate during

servicing

8-camera system, overlapping FOVs, fibre-optic network for video transmission

Project Aim: Improved security and more efficient turn-arounds

14/04/2010

9



4 dual-core server blades (1 CPU per camera)

Motion detection runs independently for each camera



Object tracking runs independently for each camera

Tracking uses colour, motion information and object features

14/04/2010

10



Cameras are calibrated

3D position of tracked objects is estimated

from image position and camera geometry



Object recognition using top-down model fitting

3D model fitting is not real-time, thus it runs on a separate thread.

Communication between the tracker and 3D model fitting is via a queue

(requests expiring if not serviced in a certain time window).

14/04/2010

11



Observations from all 8 cameras combined together

Improves tracking results and better 3D positions

A server dedicated to the data fusion process



Scene Understanding module performing behaviour recognition using

temporal and spatial logic

A dedicated server for behaviour recognition and GUI

14/04/2010

12



Scene Understanding module performing behaviour recognition using

temporal and spatial logic

A dedicated server for behaviour recognition and GUI

Motion Detection


Frame Difference Also called Frame-to-Frame Difference

Background Subtraction

Not as simple as FD

Need background

14/04/2010

13

Digital Image

Mark Borg 20 Nov 2009 CSA2207

Frame Difference


Working with greyscale images:

Given 2 frames ft, ft+1:

Given some threshold K:

BGRV 3

1

),(),(),( 1 yxfyxfABSyxd ttt

otherwise

Kyxdifyxm

t

t0

),(1),(

14/04/2010

14

Frame Difference II


Background Subtraction


14/04/2010

15

Background Model


Single video frame of empty scene?

Image data subject to noise

Background Model




Analysing pixel value variability over

several video frames... Pixel (100,100) over 200 background frames...

14/04/2010

16

Background Model






How can we model this?

Background Model






How can we model this? As a Gaussian (Normal)

Distribution

14/04/2010

17

Gaussian (Normal) Distribution


The Gaussian is described by 2 parameters:

Mean

Standard deviation

2

2

2

2

1)(

x

exf

Gaussian (Normal) Distribution II


Up to 1 standard deviation from the mean accounts for 68.27% of all values

Up to 2 standard deviations from the mean accounts for 95.45%

Up to 3 standard deviations from the mean accounts for 99.73%

Hence, the 3-sigma rule:

Nearly all values lie within 3 standard deviations of the mean.

3

14/04/2010

18

Background Model II


Modelling the background pixel with a Gaussian:

If the value of this pixel in any video frame does not obey

the 3-sigma rule: , i.e., is outside the range:

then we can consider it as not explained by our

background model, and hence probably due to motion.

0573.1,6.106

3

7.109,4.1030573.136.106

Background Model Learning


Normally an offline process

In the test video sequence supplied, there are 200 video

frames (8 seconds) available for background learning

Gaussian parameters calculated as follows:

where N is number of background frames

14/04/2010

19

Background Model Learning


Normally an offline process

In the test video sequence supplied, there are 200 video

frames (8 seconds) available for background learning

Gaussian parameters calculated as follows:

where N is number of background frames

While iterating through the

background frames, need to keep only

two values:

• sum of pixel values, and

• sum of pixel values squared

While iterating through the

background frames, need to keep only

two values:

• sum of pixel values, and

• sum of pixel values squared

Background Subtraction II


Given a frame at time t and background model :

Background subtraction is highly parallelisable

,tf

14/04/2010

20

Motion Detection results


Background subtraction running on the supplied test video:

Motion Detection results II


Some observations:

Issue of shadow

Noise in motion result (false positives)

These are mostly „isolated‟ pixels, i.e., not supported by their neighbours!

14/04/2010

21

Eliminating noise in motion bitmap


Motion pixels not supported by their neighbours are

most probably false

For each foreground (motion) pixel:

Count the number of neighbouring pixels that are also foreground

If count < K, then set the pixel to background (i.e. not a motion pixel)

Note: K is typically set to 3

Motion Detection results II


After eliminating isolated foreground pixels:

Result has improved, though there are still small groups of false positives...

...these will be eliminated later when we do blob size filtering.

14/04/2010

22

Background Model Update


The background model can get out of date!

Variation in illumination levels

Sudden changes (e.g., clouds, specular reflections)

Gradual changes (e.g., light levels during the day)

Objects can become stationary (e.g., a car is parked)

Background model needs updating

Running average

Typically alpha set to something like 0.05

From foreground pixels to blobs


The motion detection process treats each pixel

independently of the others

We must now group the foreground pixels together

to form „blobs‟

Hopefully a blob will correspond to an object

in the real world

Not always a 1-1 correspondence

Object fragmentation E.g., caused by misdetection

Object merging

E.g., caused by occlusion

The algorithm used here for grouping foreground

pixels is called Connected Component Labelling

14/04/2010

23

Connected Component Labelling


Algorithm

Input: motion bitmap

Output: a set of labelled connected components (blobs)

A 2-pass algorithm, i.e., the image data is traversed twice

Using 8-connectivity in this case

Connected Component Labelling II


Algorithm On 1st pass:

1. Iterate through each pixel:

2. If the pixel is foreground:

1. Get the foreground neighbours of the pixel

2. If none found, uniquely label the current pixel and

continue

3. Otherwise, find the neighbour with the smallest

label and assign it to the current pixel

4. If more than one label, store the equivalence

between neighbouring labels

On 2nd pass:



1. Re-label the pixel with the lowest equivalent label

14/04/2010

24








continue





On 2nd pass:











continue





On 2nd pass:




14/04/2010

25








continue





On 2nd pass:











continue





On 2nd pass:




14/04/2010

26








continue





On 2nd pass:











continue





On 2nd pass:




14/04/2010

27








continue





On 2nd pass:




Connected Component Labelling III







continue





On 2nd pass:




14/04/2010

28








continue





On 2nd pass:











continue





On 2nd pass:




14/04/2010

29








continue





On 2nd pass:











continue





On 2nd pass:




14/04/2010

30








continue





On 2nd pass:











continue





On 2nd pass:




14/04/2010

31








continue





On 2nd pass:











continue





On 2nd pass:




14/04/2010

32








continue





On 2nd pass:




Blob Descriptors


The following is the blob

information that needs to be

extracted:

Blob area/size, i.e., the number of pixels

Blob bounding box

This information can be collected during

Connected Component Labelling to avoid

performing multiple passes over the image

14/04/2010

33

Blob extraction results


After performing connected component labelling:

Note the small spurious blobs caused by noise in the motion detector

Will be solved through blob size filtering

Blob Size Filtering


Small blobs caused by noise in the motion detector

process

Solution:

Filter blobs by their area

For each blob B:

If their area < minSize:

Delete blob B

Note: using minSize = 50 is a good threshold for the video sequence used

in this assignment

14/04/2010

34

Blob Size Filtering results


After performing blob size filtering:

Number of initial blobs found: 92 Number of blobs after merging: 7

Final result


These 3 marbles appear merged into 1

blob because of partial occlusion and due

to the merging of shadows.

Our motion detector has no shadow

suppression component (It is beyond the

scope of this assignment).

14/04/2010

35

Short note on Shadow Suppression


Shadow detection and suppression Based on the observation that where a shadow falls, the background gets darker,

but its colour remains unchanged

Must use a colour space that separates colour into luminance and chrominance components RGB can‟t be used

HSV (Hue, Saturation, Value) Converting from RGB to HSV is expensive

YCbCr (Luminance, Blue-difference, Red-difference)

Luminance channel: Y, Chrominance channels: Cb, Cr

Short note on Shadow Suppression II


Using method by Horprasert et al. “Robust Background Subtraction and Shadow Detection”, ACCV, 2000

Motion Detection performed on channels Y, Cb, Cr, i.e., 3

Gaussians per pixel

Classify image pixels as:

• Background if Y, Cb, Cr are all within the range μ±3σ

• Shaded Background if Y < μ-3σ, and Cb, Cr within the range μ±3σ

• Highlighted Background if Y > μ+3σ, and Cb, Cr within the range μ±3σ

• Foreground in all other cases where Cb and/or Cr are not

within the range μ±3σ

14/04/2010

36

Short note on Shadow Suppression III


Shadow

Highlight

Foreground

Background

w/o shadow removal with shadow removal

Object Tracking


So far each video frame has been processed independently

Result at the end of each video frame, is a list of blobs

We need to keep track of the detected blobs across video

frames (across time)

Maintaining the identity of objects throughout the video sequence

time t time t+n

14/04/2010

37

Object Tracking


So far each video frame has been processed independently

Result at the end of each video frame, is a list of blobs

We need to keep track of the detected blobs across video

frames (across time)

Maintaining the identity of objects throughout the video sequence

time t time t+n

Object Tracking II


Object Tracking can be considered as a correspondence problem

Matching objects tracked in previous video frames (f0 .. ft-1) to the blobs of video frame ft.

In tracking, the blobs of frame ft are sometimes referred to as “observations”

Not always a 1-1 correspondence Objects may become occluded or partially occluded

E.g. a person walking behind a car

Objects may appear to fragment E.g. due to misdetection of part of an object

(camouflage)

Objects may appear to merge together

E.g. caused by shadow

Objects may physically combine into one

E.g. a person enters a slowly-moving car

An object may physically split into multiple ones E.g. a person leaves a bag behind

14/04/2010

38

Object Tracking II








(camouflage)






Object Tracking II








(camouflage)






14/04/2010

39

Object Tracking II








(camouflage)






Object Tracking II








(camouflage)






14/04/2010

40

Tracked Object Matching Criteria I


Can use:

Spatial proximity

E.g. Nearest object, object intersection, bounding box overlap, etc.

Disadvantage: very simple; usually not enough on its own

Area and Size

Disadvantage: very simple; usually not enough on its own

?

Tracked Object Matching Criteria II


Can use:

Colour information

E.g. Dominant colour(s), histogram, etc.

Advantage: robust to partial occlusion

Disadvantage: might fail when objects move into shadowed areas, rotate showing different

coloured/textured sides

14/04/2010

41

Tracked Object Matching Criteria III


Can use:

Shape similarity

E.g. Object shape features (like eccentricity, skew, circularity, etc.), boundary, skeleton, snakes

(active contours), etc.

Disadvantage: with the exception of snakes, might fail in case of non-rigid objects (e.g. People)

Tracked Object Matching Criteria IV


Can use:

Motion information

E.g. Speed, trajectory prediction (Kalman filter), etc.

Disadvantage: might fail on non-rigid objects or (for some of the methods) in case of non-affine

motion

14/04/2010

42

Tracked Object Matching Criteria V


Can use:

Local Features

E.g. Corners (KLT features), SIFT (Scale Invariant Feature Transform), etc.

Disadvantage: tracking result depends on quality of features selected; fail on objects containing

little texture

Tracked Object Matching Criteria VI


Can use:

Template Matching

E.g. Region correlation, etc.

Disadvantage: might fail on non-affine motion; can be very sensitive to brightness and size

variation

Optical Flow

Using the apparent motion of brightness patches in an image

Disadvantage: computationally expensive; require objects to be continually moving

Many others...

14/04/2010

43

Match Score Function


Matching of objects {O} to blobs {B} is normally performed through some

set of weighted matching criteria {ƒk}, giving a match score matrix:

),(...),(),(),score( 2211 jikkjijiji BOfwBOfwBOfwBO

0.1...where 21 kwww

MjBB

NiOO

j

i

..1 where}{and

..1 where}{and

Object Tracking Algorithm


Algorithm:

For each tracked object O, and blob (observation) B detected in video frame ft:

Compute the match score value and store result in a score matrix

Using the score matrix:

Try match the blob to one or more tracked objects

If a unique is match found:

Update the tracked object with the new blob information

If a non-unique match found:

Handle case of merged objects (e.g., best match or matching to parts of blob)

Handle case of object splitting into multiple blobs

If no match found:

Is this a potentially new tracked object?

If yes, initialise a new tracked object with the given blob

For any tracked objects not matched to blobs in frame ft:

Mark these as „lost‟ in current frame.

If a tracked object has been „lost‟ for a long time, delete it.

14/04/2010

44

Tracked Object State


Object properties:

Unique ID

Flag indicating if object is visible in current video frame

Age (i.e. for how many video frames it has been tracked)

When last seen (video frame #)

Flags indicating if currently in merged state, has fragmented, etc.

Trajectory

List of image positions of the object in the previous video frames

Some form of smoothing may be required

E.g., window-based averaging

Can be used for estimating future position (e.g. Kalman filter)

Object descriptors depending on the chosen tracking method and criteria For example:

Bounding box,

Object centre,

Size,

Colour information, local features, etc.

Tracking example I


Trajectory

Unique ID Red outline is

bounding box of

motion detection

blob; white boundary

represents the

tracked object

14/04/2010

45

Tracking example II


Trajectory smoothing...

Tracking example III


Special handling of merged objects...

ID of object 28

correctly

maintained during

partial occlusion

14/04/2010

46

Tracking example IV


Else, loss of object identity without special handling of merged objects:

Object 30 changes

ID to 32 Object 30 is lost

Tracking Demos


Let‟s look at some motion detection and object tracking

videos…

14/04/2010

47

Application of Object Tracking


We will look at one application of interest to Computer

Graphics:

Motion Capture

Recording the movements of actors

Mapping the movements on to a digital model

Motion Capture I


Non-CV motion capture

Electro-mechanical exoskeleton systems

Electro-magnetic

Inertial systems

Advantages:

Can be quite precise

No occlusion problems

Simpler data processing

Disadvantages:

Restrictive motion

Difficult to capture

facial expressions

Require manual

configuration

Source: Animazoo

14/04/2010

48

Motion Capture II


Optical motion capture systems

Require a multi-camera system

Use Object Tracking algorithms

3D position of markers obtained via stereo vision

Can suffer from problems like occlusion, tracking

errors, identity swap, etc.

Can be classified into:

Optical marker-based approaches

Most commonly used

Optical marker-less systems

Still an active research area

Though a commercial system already

available (Organic Motion)

Marker-based Motion

Capture I


Multi-camera system

Actor with reflective

markers

14/04/2010

49

Marker-based Motion

Capture II


Allows for facial expression capture

Source: Sony Pictures ImageWorks

Source: Mocaplab

Marker-based Motion Capture III


3D Marker trajectories / motion curves captured through object tracking

Transfer of motion to 3D models

14/04/2010

50

Marker-based Motion Capture IV


Some Demos

Markerless Motion Capture


Body part and shape tracking

Silhouette-based tracking

Pose reconstruction via model fitting

May require physics-based post-processing for tracking error correction

Demo of system by D. Vlasic et al., MIT, 2008.

14/04/2010

51

Thanks


computer vision for computer graphics part 2 - faculty of …sspi3/computervision-pt2.pdf ·...

Documents