automatic site monitoring using visual data

71
Improved Robustness and Efficiency for Automatic Visual Site Monitoring Gerald Dalley Thesis Defense 9 June 2009 Committee: Eric Grimson, Trevor Darrell, and Bill Freeman Collaborators: Xiaogang Wang, Josh Migdal, Kinh Tieu, Lily Lee, Tomáš Ižo,…

Upload: others

Post on 16-Oct-2021

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Automatic Site Monitoring Using Visual Data

Improved Robustness and Efficiency for Automatic Visual Site Monitoring

Gerald DalleyThesis Defense

9 June 2009

Committee: Eric Grimson, Trevor Darrell, and Bill Freeman

Collaborators: Xiaogang Wang, Josh Migdal, Kinh Tieu, Lily Lee, Tomáš Ižo,…

Page 2: Automatic Site Monitoring Using Visual Data

Commercial and Transportation Applications

2

• Efficiency– What are the traffic bottlenecks?– How can we coordinate arrival schedules to minimize

congestion?

• Marketing– How do in-store marketing campaigns effect behavior?– Are shoppers stopping at the sales booth?

• Loss prevention– How can we detect customer theft?– How can we detect employee theft?

Page 3: Automatic Site Monitoring Using Visual Data

Security Applications

• Threat detection– Unauthorized access

– Violence

– Theft

– Tailing

– Loitering

– Sudden widespread panic

• Recognition– Is this person authorized?

– Is this a “wanted” person?

• Activity understanding– What are the common

traffic patterns?

– How can we deploy security resources more effectively?

3

Page 4: Automatic Site Monitoring Using Visual Data

Applications & Typical Scenes

4

Page 5: Automatic Site Monitoring Using Visual Data

Automatic Site Monitoring Pipeline

5

Detection

Tracking

Analysis

Page 6: Automatic Site Monitoring Using Visual Data

Detection

Tracking

Analysis

Automatic Site Monitoring Pipeline

6

Feature Points

Page 7: Automatic Site Monitoring Using Visual Data

Detection

Tracking

Analysis

Automatic Site Monitoring Pipeline

7

• Background subtraction– Stauffer and Grimson, CVPR 1999.

– Boykov, Veksler, and Zabih, PAMI 2001.

– Mittal and Paragios, CVPR 2004.

– Sheikh and Shah, CVPR 2005.

– Dalley, Migdal, and Grimson, WACV 2008.

• Feature points– Shi and Tomasi, CVPR 1994.

• Strong models– Gavrila, ECCV 2000.

– Leibe, Seeman, and Schiele, CVPR 2005.

– Dalal and Triggs, CVPR 2005.

– Zhu, Yeh, Cheng, and Avidan, CVPR 2006.

– Wojek, Dorkó, Schulz, and Schiele, DAGM 2008.

Page 8: Automatic Site Monitoring Using Visual Data

• Kalman filter• Meanshift• …

Automatic Site Monitoring Pipeline

8

Detection

Tracking

Analysis

Time windowing: for rendering purposes only

Page 9: Automatic Site Monitoring Using Visual Data

Detection

Tracking

Analysis

Automatic Site Monitoring Pipeline

9

• Identifying individual people– Phillips et al. ICPR 2002.

– Sundaresan, Roy-Chowdhury, and Chellapa, ICIP 2003.

– Lee, Dalley, and Tieu, ICCV 2003.

– Veeraraghavan, Roy-Chowdhury, and Chellappa, PAMI2005.

• Recognize events (loitering, theft, etc.)– Ivanonv and Bobick, PAMI 2000.

– Vu, Bremond, and Thonnat, ECAI 2002.

– PETS 2006 and PETS 2007 workshops (many papers)

• Dalley, Wang, and Grimson, PETS 2007.

• Model flow patterns and site usage– Stauffer, CVPR 1999.

– Andrade, Blunsden, and Fisher, ICPR 2006.

– Wang, Ma, and Grimson, CVPR 2007.

– Wang et al., CVPR 2008.

Page 10: Automatic Site Monitoring Using Visual Data

Thesis Contributions

• Background subtraction– Waving trees, rippling water 5.5% drop in false positive rate

• Large-scale monitoring– Clustering of path segments – Dalal and Triggs on a GPU Up to 76x faster than CPU

• Gait recognition– Model-based silhouettes 6%—44% boost in

recognition rates

• Event detection– Integrated detection and tracking Only system to complete

the PETS 2007 challenge

10

Page 11: Automatic Site Monitoring Using Visual Data

This talk…

11

Detections

Site Activity Model

High-res Video

Page 12: Automatic Site Monitoring Using Visual Data

Outline

• Motivation

• Activity model overview

• Weak model detectors

• Strong model detector

• Data parallel implementation

• Summary

12

Page 13: Automatic Site Monitoring Using Visual Data

Outline

• Activity model overview

• Weak model detectors

• Strong model detector

• Data parallel implementation

13

Page 14: Automatic Site Monitoring Using Visual Data

High Level

• Goal

– Cluster trajectories to find common paths

• Approach

– Infinite mixture model

14

Page 15: Automatic Site Monitoring Using Visual Data

Cluster Index

Mix

ing

Wei

gh

t

Hierarchical Dirichlet Processes (HDPs)• HDPs: Teh JASA 2006

• w/ trajectories: Wang CVPR 2008

15

J

Ij

wji

zji

πj

φc λ

α

β

γ

17

φ17

Mix

ing

Wei

gh

t

Page 16: Automatic Site Monitoring Using Visual Data

Outline

• Activity model overview

• Weak model detectors

– Background subtraction

– Feature point detection

• Strong model detector

• Data parallel implementation

16

Page 17: Automatic Site Monitoring Using Visual Data

Background Subtraction

17

Page 18: Automatic Site Monitoring Using Visual Data

Background Subtraction:

Precision-Recall

18

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Recall

Pre

cisi

on

background subtraction

Ideal Corner

Page 19: Automatic Site Monitoring Using Visual Data

Background Subtraction:

Problems

19

Split blobs, missing people:glare, too frequent foreground

Merged blobs:shadows, crowds

Page 20: Automatic Site Monitoring Using Visual Data

Alternative:Shi & Tomasi Feature Point Detection

20

true positives false positives false negatives

true negatives don’t care

Page 21: Automatic Site Monitoring Using Visual Data

Improved Recall, but Low Precision

21

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Recall

Pre

cisi

on

background subtraction

Ideal Corner

feature points

Page 22: Automatic Site Monitoring Using Visual Data

Clustering Feature Point Trajectories

22

φ2

φ1

Mix

ing

Wei

gh

t

φ3

Page 23: Automatic Site Monitoring Using Visual Data

Perplexity (cluster uncertainty given observed location)

23

perplexity = 1.5

Page 24: Automatic Site Monitoring Using Visual Data

Crowded bidirectional traffic

24

Page 25: Automatic Site Monitoring Using Visual Data

Most Tracks Just Going East and West

25

Page 26: Automatic Site Monitoring Using Visual Data

A Few Bad Tracks Couple East and West

26

Normal westbound track

Normal eastbound track

Bad track #1

Bad track #2

Eastbound & #1 associated

Westbound & #2 associated

#1 & #2 associated

Track starting point

Page 27: Automatic Site Monitoring Using Visual Data

Outline

• Activity model overview

• Weak model detectors

• Strong model detector

– Dalal and Triggs' HOG detector

– Classification results

– Activity modeling results

• Data parallel implementation

27

Page 28: Automatic Site Monitoring Using Visual Data

Dalal & Triggs HOG Features

28

At every possible pedestrian location

At every block

descriptor location

Input

Window

Gamma

Correct

Block

Descr.

Locs.

Gradients

Voting

Stencil

Block

Descr.

Window

Descriptor

Page 29: Automatic Site Monitoring Using Visual Data

Sufficient Precision and Recall

29

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Recall

Pre

cisi

on

background subtraction

Ideal Corner

feature points

HOG detector

Lower precision, but fewer systematic

errors

Page 30: Automatic Site Monitoring Using Visual Data

Better Perplexity

30

Point Trackingmean = 1.5

median = 1.1

Pedestrian Detector Tracksmean = 2.6

median = 2.4

Page 31: Automatic Site Monitoring Using Visual Data

Selected Clusters

31

SE Escalator SE NW

SE SW SW SE

Page 32: Automatic Site Monitoring Using Visual Data

Breaking up Merged Paths

32

More permissive priors Can separate the 6 paths from west to escalators

Page 33: Automatic Site Monitoring Using Visual Data

Some Directional Degeneracies Remain

33

Cause: tracking errors

Cause: loitering and meandering

Page 34: Automatic Site Monitoring Using Visual Data

Outline

• Activity model overview

• Weak model detectors

• Strong model detector

• Data parallel implementation

– Motivation

– GPU Intuition

– Our design

– Speedups

34

Page 35: Automatic Site Monitoring Using Visual Data

Good Results, but Too Slow

35

years compute 8 hours 40

videoofsec.

frames30

frame

.seccompute60

days compute 75 videoofhour 1sec.

frames30

frame

.seccompute60

…a little faster would be nice.

Our data:• 40 hours• 1920×1080 frames

• 6.75× the pixels/frame w.r.t. 640×480• 27× the pixels/frame w.r.t. 320×240

• progressive scan

Page 36: Automatic Site Monitoring Using Visual Data

CPU Characteristics

• One thing fast– High clock speed

– Pipelining

• Complex control flow– Cache

– Branch prediction

– Speculation

– …

36

• Task parallel: a few different things fast– Multicore

– Hyperthreading

– Sophisticated caches

• Data parallel:Same instruction on a few data items– MMX, SSE, etc.

Page 37: Automatic Site Monitoring Using Visual Data

GPU Characteristics

• Same instruction, many data items– 240 “cores” or more

• Very high memory bandwidth – 10× a CPU’s

• Typical speedups: – 10×—100×

• Programming– Style: C/C++

– Optimization effort ≈ C++ & assembly mix

• Slow if…– Insufficiently parallel code

– Random memory access

– Branching

37

Page 38: Automatic Site Monitoring Using Visual Data

Intuition: What Works Well on a GPU

• In general

– 10<<<MANY>>> independent inputs and/or outputs

– Localized memory access

• Typical applications

– Filterbanks

– Sliding window algorithms

– Code that’s easy to vectorize in Matlab

38

Page 39: Automatic Site Monitoring Using Visual Data

Dalal & Triggs HOG Features

39

At every possible pedestrian location

At every block

descriptor location

Input

Window

Gamma

Correct

Block

Descr.

Locs.

Gradients

Voting

Stencil

Block

Descr.

Window

Descriptor

Page 40: Automatic Site Monitoring Using Visual Data

Our CUDA Pipeline

40

for each

frame

for all scales

Read Framememcpy to

GPU

Crop and

Resample

Normalized Block

Descriptors

Linear SVM

Classification

memcpy to

CPU

MeanshiftNon-maxima

Suppression

Kalman

TrackingOutput

Gamma Correction

and Gradients

Data

ReorderingDecode Frame

Runs on CPU (1 thread)

Runs on GPU Possible on GPU

CPU ↔ GPU transfers

Page 41: Automatic Site Monitoring Using Visual Data

CPU vs. GPU Times:Results from a Simplified Profiling Application

Processing StepCPU

ImplementationGPU

ImplementationGPU

Speedup

Read input (CPU) 0% 17%

GPU resizer setup 5%

Resize 4% 11% 24.3×

Gradients 24% 9% 164.0×

Normalized block descriptors 57% 35% 97.7×

Window classification 14% 8% 100.6×

Cleanup 0% 12% 0.5×

Detection (CPU) 0% 4% 1.1×

TOTAL 23 seconds 0.4 seconds 58.8×

41

Page 42: Automatic Site Monitoring Using Visual Data

GPU Speedup Results

• Our Implementation– 58.8× to 76× speedup (vs. optimized CPU-only)– Current bottlenecks

• Video decoding on the CPU (17%)• Block descriptors (35%)• Bookkeeping & memory transfers (17%)

• Wojek, Dorkó, Schulz, Schiele [DAGM 2008]– 30× speedup– Optimized for the previous GPU architecture– Less efficient usage of memory bandwidth

42

Page 43: Automatic Site Monitoring Using Visual Data

Summary

• Fast HOG implementation

– 58.8× to 76× speedup

• Better clustering of trajectory flows

– Qualitative improvements

– Perplexity

43

Page 44: Automatic Site Monitoring Using Visual Data

Future Work

• Scale to true HD real-time – Multithreaded CPU

– Multiple GPUs

– Asynchronous data transfers

– More computation to GPUs

• Better HOG training– Explicit occlusion handling

– Add video features (a la Dalal and Triggs 2006)

• Alternative detectors– Boosted cascade on GPUs

(CPU: Avidan; Viola & Jones)

• Activity modeling– Learn long-term flow trends

– Temporal dependencies (via HMMs)

• Integrate with other technologies in this thesis…

44

Page 45: Automatic Site Monitoring Using Visual Data

Appearance models for recognition

Multimodal trackingfor event detection

Other Potential Applicationsfor Fast and Robust Pedestrian Detection

45

Abandoned Luggage!

Page 46: Automatic Site Monitoring Using Visual Data

Acknowledgments

• Thesis Committee

– Eric Grimson

– Bill Freeman

– Trevor Darrell

• Too many friends and fellow students here at MIT to list individually…

• Collaborators

– Xiaogang Wang

– Josh Migdal

– Kinh Tieu

– Lily Lee

– Tomáš Ižo

– Jim Sukha, Krista Ehinger, and Geza Kovacs

46

• Funders

– DARPA

– MIT

– Shell

– Singapore

– …

• Work Experiences

– Microsoft Research

– MERL

– BAE Systems

– D.E. Shaw

• My wife, Dianna

Page 47: Automatic Site Monitoring Using Visual Data

47

Page 48: Automatic Site Monitoring Using Visual Data

Detection

Tracking

Analysis

Automatic Site Monitoring Pipeline

48

• Background subtraction

– Stauffer and Grimson. Adaptive Background Mixture Models for Real-time Tracking. CVPR. 1999.

– Boykov, Veksler, and Zabih. Fast Approximate Energy Minimization via Graph Cuts. PAMI. 2001.

– Mittal and Paragios. Motion-based Background Subtraction using Adaptive Kernel Density Estimation. CVPR. 2004.

– Migdal and Grimson. Background Subtraction using Markov Thresholds. MVC. 2005.

– Sheikh and Shah. Bayesian Object Detection in Dynamic Scenes. CVPR. 2005.

– Dalley, Migdal, and Grimson. Background Subtraction for Temporally Irregular Dynamic Textures. WACV. 2008.

• Feature points

– Shi and Tomasi. Good Features to Track. CVPR. 1994.

• Strong models

– Gavrila. Pedestrian Detection from a Moving Vehicle. ECCV. 2000.

– Leibe, Seeman, and Schiele. Pedestrian Detection in Crowded Scenes. CVPR. 2005.

– Dalal and Triggs. Histograms of Oriented Gradients for Human Detection. CVPR. 2005.

– Zhu, Yeh, Cheng, and Avidan. Fast Human Detection using a Cascade of Histograms of Oriented Gradients. CVPR. 2006.

– Wojek, Dorkó, Schulz, and Schiele. Sliding-windows for Rapid Object Class Localization: A Parallel Technique. DAGM. 2008.

Page 49: Automatic Site Monitoring Using Visual Data

Detection

Tracking

Analysis

Automatic Site Monitoring Pipeline

49

• Identifying individual people– Sinha, Balas, Ostrovsky, and Russell. Face Recognition by Humans: Nineteen Results All

Computer Vision Researcher Should Know About. IEEE. 2006.– Phillips et al. The Gait Identification Challenge Problem: Data Sets and Baseline

Algorithm. ICPR. 2002.– Sundaresan, Roy-Chowdhury, and Chellapa. A Hidden Markov Model Based Framework

for Recognition of Humans from Gait Sequences. ICIP. 2003.– Lee, Dalley, and Tieu. Learning Pedestrian Models for Silhouette Refinement. ICCV.

2003.– Veeraraghavan, Roy-Chowdhury, and Chellappa. Matching Shape Sequences in Video

with Applications in Human Movement Analysis. PAMI. 2005.

• Recognize events (loitering, theft, etc.)– Ivanonv and Bobick. Recognition of Visual Activities and Interactions by Stochastic

Parsing. PAMI. 2000.– Vu, Bremond, and Thonnat . Temporal Constraints for Video Interpretation. ECAI. 2002.– Francois et al. VERL: An Ontology Framework for Representing and Annotating Video

Events. Multimedia. 2005.– PETS 2006 and PETS 2007 workshops (many papers)– Dalley, Wang, and Grimson. Event Detection using an Attention-based Tracker.

PETS. 2007.

• Model flow patterns and site usage– Stauffer. Automatic Hierarchical Classification using Time-based Co-occurrences. CVPR.

1999.– Andrade, Blunsden, and Fisher. Modeling Crowd Scenes for Event Detection. ICPR. 2006.– Wang, Ma, and Grimson. Unsupervised Activity Perception by Hierarchical Bayesian

Models. CVPR. 2007.– Wang et al. Trajectory Analysis and Semantic Region Modeling using a Nonparametric

Bayesian Model. CVPR. 2008.

Page 50: Automatic Site Monitoring Using Visual Data

Same Quality(minor differences due to training tweaks)

50

Dalal & Triggs’ CPU ImplementationOur GPU Implementation

Due to some minor training artifacts

Page 51: Automatic Site Monitoring Using Visual Data

A Learned Classification Boundary (rotated)

51

Page 52: Automatic Site Monitoring Using Visual Data

Detections on One Frame

52

gege

Page 53: Automatic Site Monitoring Using Visual Data

ROC Curves

0 0.05 0.1 0.15 0.2 0.250

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

False Positive Rate

Tru

e P

osi

tiv

e R

ate

pedestrian detection: all non-masked

corner detection: all non-masked

backround subtraction: all non-masked

chance classifier

53

Page 54: Automatic Site Monitoring Using Visual Data

Training Set Influence

54

Page 55: Automatic Site Monitoring Using Visual Data

MRF equations[MG05]

55

Grid of observed pixel colors

Grid of unknown FG/BG labels

Page 56: Automatic Site Monitoring Using Visual Data

Detections before Global Optimization

56

Page 57: Automatic Site Monitoring Using Visual Data

Sample Marginal of Observations

57

Page 58: Automatic Site Monitoring Using Visual Data

Our Data

58

12

4 p

ixel

s

Page 59: Automatic Site Monitoring Using Visual Data

Foreground Likelihood

59

Page 60: Automatic Site Monitoring Using Visual Data

Training Data

60

Page 61: Automatic Site Monitoring Using Visual Data

Our CUDA Pipeline: Percent Time per Module

61

for each

frame

for all scales

Read Framememcpy to

GPU

Crop and

Resample

Normalized Block

Descriptors

Linear SVM

Classification

memcpy to

CPU

MeanshiftNon-maxima

Suppression

Kalman

TrackingOutput

Gamma Correction

and Gradients

Data

ReorderingDecode Frame

0%/17% 4%/11% - / 5%

4%/11%

24%/9%

57% / 35% 14% / 8% 0%/12%

0% / 4%

Page 62: Automatic Site Monitoring Using Visual Data

CPU vs. GPU Times:Results from a Simplified Profiling Application

Processing StepTime

(CPU Impl.)Time

(GPU Impl.)GPU Impl. Speedup

Read input (CPU) 68.0 ms (0%) 68.0 ms (17%)

GPU resizer setup 18.1 ms (5%)

Resize 1,045.8 ms (4%) 43.0 ms (11%) 24.3×

Gradients 5,636.2 ms (24%) 34.4 ms (9%) 164.0×

Normalized block descriptors

13,412.3 ms (57%) 137.2 ms (35%) 97.7×

Window classification 3,159.1 ms (14%) 31.4 ms (8%) 100.6×

Cleanup 23.8 ms (0%) 45.5 ms (12%) 0.5×

Detection (CPU) 16.2 ms (0%) 15.0 ms ( 4%) 1.1×

TOTAL 23,082.4 ms 392.3 ms 58.8×

62

Page 63: Automatic Site Monitoring Using Visual Data

63

Page 64: Automatic Site Monitoring Using Visual Data

Suppressing Spurious Detections

64

Page 65: Automatic Site Monitoring Using Visual Data

65

Results with Mittal & Paragios Traffic Clip

Key:

• bboxes from ground truth– True

positive

– False positive

– False negative

Page 66: Automatic Site Monitoring Using Visual Data

66

Ground truth Mittal & Paragios (2 TP, 0 FP, 1 FN)

MoG (3 TP, 3 FP, 0 FN) Ours (3 TP, 0 FP, 0 FN)

Page 67: Automatic Site Monitoring Using Visual Data

67

Pixels Affected by a GivenMixture Component

Our Model

• Mixture of Gaussians (MoG)

Standard MoG

Ours

p(cij©) /P

j2Ni wjN (ci;¹j ;§j)

ci the observed color at pixel i

© the model fwj ; ¹j ;§jgj

Ni neighborhood of pixel i

p(cij©) /P

j2Ni wjN (ci;¹j ;§j)

ci the observed color at pixel i

© the model fwj ; ¹j ;§jgj

Ni neighborhood of pixel i

Page 68: Automatic Site Monitoring Using Visual Data

68

Foreground/Background Classification

• Find best matching background Gaussian, j– Use neighborhood

• Squared Mahalanobis Distance

-

=

input mixture model

dij = (ci ¡ ¹j)T§¡1j (ci ¡ ¹j)dij = (ci ¡ ¹j)T§¡1j (ci ¡ ¹j)

φ3 φ3 φ3

dij2

Page 69: Automatic Site Monitoring Using Visual Data

69

hard Stauffer-Grimson

soft Stauffer-Grimson

best match only

update all matches

Model Update Options

Hard UpdatesSoft Updates

Lo

cal

Reg

ion

al

Page 70: Automatic Site Monitoring Using Visual Data

70

ROC (Wallflower)

Page 71: Automatic Site Monitoring Using Visual Data

71

Parameter Sensitivity

0 1000 2000 3000 4000 5000 6000 70000

0.2

0.4

0.6

0.8

1

min number of mistakes

freque

ncy

w s=7

w s=5

w s=3

w s=1

traffic

0 500 1000 1500 20000

0.05

0.1

0.15

min number of mistakes

freque

ncy

w s=7

w s=5

w s=3

w s=1

wallflower

max max