cnn-based crowd counting methodshji/cs519_slides/cnn-based crowd counting … · cross-scene crowd...

35
CNN-based Crowd Counting Methods Tannaz R.Damavandi Elinor Huntington

Upload: others

Post on 03-Aug-2020

8 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: CNN-based Crowd Counting Methodshji/cs519_slides/CNN-based Crowd Counting … · Cross-scene crowd counting via deep convolutional neural networks (Zhang et al. Model)(2015) This

CNN-based Crowd Counting Methods

Tannaz R.Damavandi Elinor Huntington

Page 2: CNN-based Crowd Counting Methodshji/cs519_slides/CNN-based Crowd Counting … · Cross-scene crowd counting via deep convolutional neural networks (Zhang et al. Model)(2015) This

Introduction

Crowd counting has a wide range of applications that cross the boundaries of science and

engineering such as:

● Geopolitical and civic applications

● Crowd control and public safety

● Transportation systems design and traffic control

● Counting cells or bacteria on the microscopic level

Image source : http://www.robots.ox.ac.uk/~vgg/projects/seebibyte//images/Counting3.jpg

Imgae source https://i.kinja-img.com/gawker-media/image/upload/s--PgpCmwTr--/c_scale,fl_progressive,q_80,w_800/ezbhvc4qy5vdeebfwcgx.jpg

Page 3: CNN-based Crowd Counting Methodshji/cs519_slides/CNN-based Crowd Counting … · Cross-scene crowd counting via deep convolutional neural networks (Zhang et al. Model)(2015) This

Introduction (Cont’d)

This challenging task needs to consider many factors such as inter-occlusion between people

and similarity among background features and crowds faces.

Image source: ShanghaiTech dataset

Page 4: CNN-based Crowd Counting Methodshji/cs519_slides/CNN-based Crowd Counting … · Cross-scene crowd counting via deep convolutional neural networks (Zhang et al. Model)(2015) This

Background

Herbert Jacob’s method (1967)

Crowd Count = Avg. number of people in a section * Number of sections

Drawback:

● Crowds not distributed uniformly

Solution:

● Estimate the count for each patch and add all

these estimates together.

Image source : https://airphotoslive.com//wp-content/uploads/2013/04/crowd-counting-02.jpg

Page 5: CNN-based Crowd Counting Methodshji/cs519_slides/CNN-based Crowd Counting … · Cross-scene crowd counting via deep convolutional neural networks (Zhang et al. Model)(2015) This

Background (Cont’d)

Most of the proposed automated models for crowd counting are not capable of handling

large crowds, especially when the number of people exceeds hundreds of thousands.

Three main crowd counting methods:

● Pixel-based analysis

○ Edge info and density map

(Zhang et al Model , MCNN , SCNN)

● Texture-based analysis

○ Fourier analysis

● Object level analysis

○ Locate individual in a scene

Page 6: CNN-based Crowd Counting Methodshji/cs519_slides/CNN-based Crowd Counting … · Cross-scene crowd counting via deep convolutional neural networks (Zhang et al. Model)(2015) This

Cross-scene crowd counting via deep convolutional neural

networks (Zhang et al. Model)(2015)

This model is the precursor to MCNN and SCNN.

Model: ● 3 convolution layers.

● 3 fully connected layers.

● 2 Max pooling layers with a 2 × 2 kernel size.

● Activation function: ReLU

WorldExpo’10 crowd counting dataset was firstly introduced by Zhang et al. This dataset contains 1132 annotated video sequences

which are captured by 108 surveillance cameras, all from Shanghai 2010WorldExpo.

Page 7: CNN-based Crowd Counting Methodshji/cs519_slides/CNN-based Crowd Counting … · Cross-scene crowd counting via deep convolutional neural networks (Zhang et al. Model)(2015) This

Zhang et. al model(Cont’d)

Page 8: CNN-based Crowd Counting Methodshji/cs519_slides/CNN-based Crowd Counting … · Cross-scene crowd counting via deep convolutional neural networks (Zhang et al. Model)(2015) This

MCNN (Multi-Column Convolutional Neural Network)

Two natural configuration to crowd count using CNNs

1- Direct headcount

2- Density map of the crowd

MCNN is in favor of second model

Advantages:

● Features learned by each column are adaptive to variations in people/head size due to perspective

effect or image resolution.

● True density map is computed accurately based on geometry-adaptive kernels which do not need

to know the perspective map of the input image.

Page 9: CNN-based Crowd Counting Methodshji/cs519_slides/CNN-based Crowd Counting … · Cross-scene crowd counting via deep convolutional neural networks (Zhang et al. Model)(2015) This

MCNN

Model: ● 3 parallel CNNs with different size of local receptive fields

● 2 Max pooling is applied for each 2×2 region.

● Activation function:Rectified linear unit (ReLU)

Page 10: CNN-based Crowd Counting Methodshji/cs519_slides/CNN-based Crowd Counting … · Cross-scene crowd counting via deep convolutional neural networks (Zhang et al. Model)(2015) This

Data sets

Table 1 - Comparison of Shanghai Tech dataset with existing datasets: Num is the number of images; Max is the maximal crowd count;

Min is the minimal crowd count; Ave is the average crowd count; Total is total number of labeled people.

Page 11: CNN-based Crowd Counting Methodshji/cs519_slides/CNN-based Crowd Counting … · Cross-scene crowd counting via deep convolutional neural networks (Zhang et al. Model)(2015) This

MCNN-Density Map via Geometry Adaptive Kernels

Accurate estimation of the crowd density

Homography between the ground plane and the image plane

The geometry of the scene

Uniform distribution of crowd around each head

Average KNN

Original images and corresponding crowd density maps obtained by convolving geometry-adaptive Gaussian kernels.

Page 12: CNN-based Crowd Counting Methodshji/cs519_slides/CNN-based Crowd Counting … · Cross-scene crowd counting via deep convolutional neural networks (Zhang et al. Model)(2015) This

MCNN (Cont’d) Loss function :

𝛩 : a set of learnable parameters in the MCNN.

N : number of training image.

Xi :input image and

Fi :the ground truth density map of image Xi.

F(Xi; 𝛩) : estimated density map generated by MCNN

which is parameterized with 𝛩 for sample Xi.

L : loss between estimated density map and the ground truth density map.

The loss function can be optimized via batch-based stochastic gradient descent and

backpropagation.

Page 13: CNN-based Crowd Counting Methodshji/cs519_slides/CNN-based Crowd Counting … · Cross-scene crowd counting via deep convolutional neural networks (Zhang et al. Model)(2015) This

SCNN

● Switching Convolutional Neural

Network

○ 3 small CNNs (aka Regressors)

○ 1 VGG16-based switch

● Images are patched

● Each patch is processed by the

switch and sent to one of the

Regressors

● Output is a density map

https://arxiv.org/pdf/1708.00199.pdf

Page 14: CNN-based Crowd Counting Methodshji/cs519_slides/CNN-based Crowd Counting … · Cross-scene crowd counting via deep convolutional neural networks (Zhang et al. Model)(2015) This

Data

● Each input image is

patched into 9 smaller

images

● If training, the ground truth

is transformed into a

density map for model

output comparison

Page 15: CNN-based Crowd Counting Methodshji/cs519_slides/CNN-based Crowd Counting … · Cross-scene crowd counting via deep convolutional neural networks (Zhang et al. Model)(2015) This

SCNN Regressors

● Based on the MCNN

Regressor architecture

● Four convolutional layers, 2

max pooling layers, and a

final 1 x 1 layer to

transform features into a

density map.

Page 16: CNN-based Crowd Counting Methodshji/cs519_slides/CNN-based Crowd Counting … · Cross-scene crowd counting via deep convolutional neural networks (Zhang et al. Model)(2015) This

SCNN Regressors

● Each regressor has a

different receptive field that

evaluates crowd density.

● Uses mean inter-head

distance as a proxy for

crowd density.

Page 17: CNN-based Crowd Counting Methodshji/cs519_slides/CNN-based Crowd Counting … · Cross-scene crowd counting via deep convolutional neural networks (Zhang et al. Model)(2015) This

SCNN Switch

● First 5 convolutional / max pooling

layers the same as VGG16

● Followed by Global Average Pooling

layer (GAP) and 2 fully connected layers

○ Similar to the final stages of

ResNet-50

○ GAP minimizes overfitting

● Finally, softmax to classify the image

patch to a regressor

Page 18: CNN-based Crowd Counting Methodshji/cs519_slides/CNN-based Crowd Counting … · Cross-scene crowd counting via deep convolutional neural networks (Zhang et al. Model)(2015) This

SCNN Algorithm

● There are 3 main training

stages

○ Pretraining

○ Differential training

○ Coupled Training

Page 19: CNN-based Crowd Counting Methodshji/cs519_slides/CNN-based Crowd Counting … · Cross-scene crowd counting via deep convolutional neural networks (Zhang et al. Model)(2015) This

Regressor

Pretraining

● The 3 Regressors are each

pretrained on the full

training dataset to learn

initial features that will be

fine-tuned in later stages.

● Uses Least Squares Error

(LSE / L2-norm) to

minimize the Euclidean

distance between the

Regressor output and the

given density map.

Number of training samples Regressor output

Density map of ground truth for given training sample

Page 20: CNN-based Crowd Counting Methodshji/cs519_slides/CNN-based Crowd Counting … · Cross-scene crowd counting via deep convolutional neural networks (Zhang et al. Model)(2015) This

Differential

Training

● Backpropagation is done

with the same L2-norm loss

on density maps as in

pretraining.

● However, the choice of

which Regressor to

backpropagate on is

determined by count error.

Page 21: CNN-based Crowd Counting Methodshji/cs519_slides/CNN-based Crowd Counting … · Cross-scene crowd counting via deep convolutional neural networks (Zhang et al. Model)(2015) This

Coupled Training

● Alternate training the

switch and back

propagating on the chosen

regressor for each epoch

● This is so that the

regressors and the switch

are co-adapted to the

training input

Page 22: CNN-based Crowd Counting Methodshji/cs519_slides/CNN-based Crowd Counting … · Cross-scene crowd counting via deep convolutional neural networks (Zhang et al. Model)(2015) This

Evaluation

Part A Part B

Method MAE MSE MAE MSE

Zhang et al. 181.8 277.7 32.0 49.8

MCNN 110.2 173.2 26.4 41.3

SCNN 90.4 135.0 21.6 33.4

Page 23: CNN-based Crowd Counting Methodshji/cs519_slides/CNN-based Crowd Counting … · Cross-scene crowd counting via deep convolutional neural networks (Zhang et al. Model)(2015) This

DEMO

Page 24: CNN-based Crowd Counting Methodshji/cs519_slides/CNN-based Crowd Counting … · Cross-scene crowd counting via deep convolutional neural networks (Zhang et al. Model)(2015) This
Page 25: CNN-based Crowd Counting Methodshji/cs519_slides/CNN-based Crowd Counting … · Cross-scene crowd counting via deep convolutional neural networks (Zhang et al. Model)(2015) This
Page 26: CNN-based Crowd Counting Methodshji/cs519_slides/CNN-based Crowd Counting … · Cross-scene crowd counting via deep convolutional neural networks (Zhang et al. Model)(2015) This
Page 27: CNN-based Crowd Counting Methodshji/cs519_slides/CNN-based Crowd Counting … · Cross-scene crowd counting via deep convolutional neural networks (Zhang et al. Model)(2015) This
Page 28: CNN-based Crowd Counting Methodshji/cs519_slides/CNN-based Crowd Counting … · Cross-scene crowd counting via deep convolutional neural networks (Zhang et al. Model)(2015) This
Page 29: CNN-based Crowd Counting Methodshji/cs519_slides/CNN-based Crowd Counting … · Cross-scene crowd counting via deep convolutional neural networks (Zhang et al. Model)(2015) This
Page 33: CNN-based Crowd Counting Methodshji/cs519_slides/CNN-based Crowd Counting … · Cross-scene crowd counting via deep convolutional neural networks (Zhang et al. Model)(2015) This

Conclusions and Further Work

● This method can have good results on density, but it almost always undercounts actual people

unless there is some other object that it recognizes, like trees, flags, open sky…

● To be truly useful, it would probably have to be trained with given perspectives so that it could

eliminate non human objects from its recognition.

○ This could occur in a security scenario, where you would have fixed video perspectives, but it

would require a lot of work to create ground truths.

● Future work to refine this model

○ Modify the switch architecture

○ For live video input, use a different algorithm that chooses the regressor beforehand, see what

impact this has on counts

○ Examine the difference in counts between whole input images and patched ones

Page 34: CNN-based Crowd Counting Methodshji/cs519_slides/CNN-based Crowd Counting … · Cross-scene crowd counting via deep convolutional neural networks (Zhang et al. Model)(2015) This

References

[1] B. A. Bansal and K. Venkatesh. People counting in high density crowds from still images. 2015.

[2] D. B. Sam, S. Surya, and R. V. Babu. Switching convolutional neural network for crowd counting. CoRR,abs/1708.00199,

August 2017.

[3] Ryan, David, Denman, Simon, Sridharan, Sridha, & Fookes, Clinton B. (2015) An evaluation of crowd counting methods,

features and regression models. Computer Vision and Image Understanding, 130, pp. 1-17.

[4] Y. Zhang, D. Zhou, S. Chen, S. Gao, and Y. Ma. Single image crowd counting via multi-column convolutional neural

network. CVPR IEEE, 10.1109/CVPR.2016.70, Jun 2016.

[5] C. Zhang, H. Li, X. Wang, and X. Yang. Cross-scene crowd counting via deep convolutional neural networks. In

CVPR,2015.

[6] Goodier, R. (2011). The Curious Science of Counting a Crowd. [online] Popular Mechanics. Available at:

http://www.popularmechanics.com/science/a7121/the-curious-science-of-counting-a-crowd/ [Accessed 25 Nov. 2017]

Page 35: CNN-based Crowd Counting Methodshji/cs519_slides/CNN-based Crowd Counting … · Cross-scene crowd counting via deep convolutional neural networks (Zhang et al. Model)(2015) This