recognize complex events from static images by fusing deep...

Recognize Complex Events from Static Images by Fusing Deep Channels Supplementary Materials Yuanjun Xiong 1 Kai Zhu 1 Dahua Lin 1 Xiaoou Tang 1,2 1 Department of Information Engineering, The Chinese University of Hong Kong 2 Shenzhen key lab of Comp. Vis. & Pat. Rec., Shenzhen Institutes of Advanced Technology, CAS, China [email protected] [email protected] [email protected] [email protected] 1. Model Specification Our model is implemented on Caffe [1]. It is a common programming framework for deep learning. The detailed specification of the model can be downloaded from the project website http://personal.ie.cuhk.edu. hk/ ˜ xy012/event_recog. It can be read by Caffe and edited with any text editor. 2. WIDER Dataset The dataset can be downloaded from the project website http://personal.ie.cuhk.edu.hk/ ˜ xy012/event_recog/WIDER. 3. Parameters of Compared Methods We compared our method with three different methods. The detailed settings of their parameters are described be- low. Gist We use the implementation described in [4]. Im- ages are resized to 128 × 128. The orientation scales are (8, 8, 8, 8). Spatial Pyramid Matching (SPM) The implementation of spatial pyramid matching algorithm is based on [2]. We use pyramids of three levels. The low-level visual features are characterized by SIFT descriptors. These features are encoded with a codebook of 1500 visual terms. RCNNBank The RCNNBank is implemented based on ObjectBank [3], except that the detector responses are re- placed by the activation features obtained using CNN. For each image, we first obtain 500 bounding boxes with high- est proposal scores and apply the CNN to derive action features. This results in 500 activation feature vectors of 4096 dimensions. Then we splice the image into evenly spaced regions of size 3 × 3. The activation features are accumu- lated to the corresponding regions according to the positions of the bounding boxs. Finally, we get a representation of 4096 × 9 = 36864 dimensions. 4. Dataset Examples Figure 1, 2, 3, 4, 5, and 6 present sample images of the WIDER dataset, five for each class. Note we resized all images to squares for consistent layout. References [1] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Gir- shick, S. Guadarrama, and T. Darrell. Caffe: Convolu- tional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093, 2014. 1 [2] S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, volume 2, pages 2169–2178. IEEE, 2006. 1 [3] L.-J. Li, H. Su, L. Fei-Fei, and E. P. Xing. Object bank: A high-level image representation for scene classification & semantic feature sparsification. In Advances in neural information processing systems, pages 1378–1386, 2010. 1 [4] A. Oliva and A. Torralba. Modeling the shape of the scene: A holistic representation of the spatial envelope. International journal of computer vision, 42(3):145–175, 2001. 1 1

Upload: others

Post on 03-Oct-2020

2 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: Recognize Complex Events from Static Images by Fusing Deep …yjxiong.me/papers/cvpr15event_supp.pdf · 2016. 11. 20. · Recognize Complex Events from Static Images by Fusing Deep

Recognize Complex Events from Static Images by Fusing Deep ChannelsSupplementary Materials

Yuanjun Xiong1 Kai Zhu1 Dahua Lin1 Xiaoou Tang1,2

1Department of Information Engineering, The Chinese University of Hong Kong2Shenzhen key lab of Comp. Vis. & Pat. Rec., Shenzhen Institutes of Advanced Technology,

CAS, [email protected] [email protected] [email protected] [email protected]

1. Model SpecificationOur model is implemented on Caffe [1]. It is a common

programming framework for deep learning. The detailed

specification of the model can be downloaded from the

project website http://personal.ie.cuhk.edu.hk/˜xy012/event_recog. It can be read by Caffe and

edited with any text editor.

2. WIDER DatasetThe dataset can be downloaded from the project

website http://personal.ie.cuhk.edu.hk/

˜xy012/event_recog/WIDER.

3. Parameters of Compared MethodsWe compared our method with three different methods.

The detailed settings of their parameters are described be-

low.

Gist We use the implementation described in [4]. Im-

ages are resized to 128 × 128. The orientation scales are

(8, 8, 8, 8).

Spatial Pyramid Matching (SPM) The implementation

of spatial pyramid matching algorithm is based on [2]. We

use pyramids of three levels. The low-level visual features

are characterized by SIFT descriptors. These features are

encoded with a codebook of 1500 visual terms.

RCNNBank The RCNNBank is implemented based on

ObjectBank [3], except that the detector responses are re-

placed by the activation features obtained using CNN. For

each image, we first obtain 500 bounding boxes with high-

est proposal scores and apply the CNN to derive action fea-

tures. This results in 500 activation feature vectors of 4096dimensions. Then we splice the image into evenly spaced

regions of size 3 × 3. The activation features are accumu-

lated to the corresponding regions according to the positions

of the bounding boxs. Finally, we get a representation of

4096× 9 = 36864 dimensions.

4. Dataset ExamplesFigure 1, 2, 3, 4, 5, and 6 present sample images of the

WIDER dataset, five for each class. Note we resized all

images to squares for consistent layout.

References[1] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Gir-

shick, S. Guadarrama, and T. Darrell. Caffe: Convolu-

tional architecture for fast feature embedding. arXiv preprintarXiv:1408.5093, 2014. 1

[2] S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of fea-

tures: Spatial pyramid matching for recognizing natural scene

categories. In Computer Vision and Pattern Recognition,2006 IEEE Computer Society Conference on, volume 2, pages

2169–2178. IEEE, 2006. 1

[3] L.-J. Li, H. Su, L. Fei-Fei, and E. P. Xing. Object bank: A

high-level image representation for scene classification & se-

mantic feature sparsification. In Advances in neural informa-tion processing systems, pages 1378–1386, 2010. 1

[4] A. Oliva and A. Torralba. Modeling the shape of the scene: A

holistic representation of the spatial envelope. Internationaljournal of computer vision, 42(3):145–175, 2001. 1

Page 2: Recognize Complex Events from Static Images by Fusing Deep …yjxiong.me/papers/cvpr15event_supp.pdf · 2016. 11. 20. · Recognize Complex Events from Static Images by Fusing Deep

Figure 1: Sample images in the dataset.

Page 3: Recognize Complex Events from Static Images by Fusing Deep …yjxiong.me/papers/cvpr15event_supp.pdf · 2016. 11. 20. · Recognize Complex Events from Static Images by Fusing Deep

Figure 2: Sample images in the dataset.

Page 4: Recognize Complex Events from Static Images by Fusing Deep …yjxiong.me/papers/cvpr15event_supp.pdf · 2016. 11. 20. · Recognize Complex Events from Static Images by Fusing Deep

Figure 3: Sample images in the dataset.

Page 5: Recognize Complex Events from Static Images by Fusing Deep …yjxiong.me/papers/cvpr15event_supp.pdf · 2016. 11. 20. · Recognize Complex Events from Static Images by Fusing Deep

Figure 4: Sample images in the dataset.

Page 6: Recognize Complex Events from Static Images by Fusing Deep …yjxiong.me/papers/cvpr15event_supp.pdf · 2016. 11. 20. · Recognize Complex Events from Static Images by Fusing Deep

Figure 5: Sample images in the dataset.

Page 7: Recognize Complex Events from Static Images by Fusing Deep …yjxiong.me/papers/cvpr15event_supp.pdf · 2016. 11. 20. · Recognize Complex Events from Static Images by Fusing Deep

Figure 6: Sample images in the dataset.

Fusing and Circuit Breakers

Fusing glass

Garment Fusing Machine Garment Fusing Machine by Unitech garment-machinery

FUSING - Meyer Machines · PDF filecontrol Technical data: RPS E2 Fusing width (mm) 1,000 1,400 1,800 ... fusing applications with top quality. In combination with our own development

Fours Fusing CERADEL

Fusing Family and Firm

ASME IX-plastic Fusing

Eutalloy powder spray fusing

fusing glass02

Fusing Equipments

Stained Glass Fusing Lampworking

ROHDE FUSING KILNS

Samsung - fusing with technology

Fusing and Shirts

Crowd Counting by Adaptively Fusing Predictions from an ...visal.cs.cityu.edu.hk/static/pubs/conf/bmvc18-counting-pyramid.pdf · Crowd counting is very challenging due to illumination

Fusing Recommendations for Social Bookmarking Websitesvbn.aau.dk/files/.../Fusing_Recommendations_for_Social_Bookmarking... · Fusing Recommendations for Social Bookmarking Websites

Fusing machinery and equipments

Initiation Fusing[1]

BASIC GLASS FUSING · 2012-08-24 · you will want to set aside a separate area for your fusing glass, leftover scraps, and other fusing supplies. You can always use fusing glass

Guia Oficial Fusing 2014

Fusing Camera and Lidar to Detect and Recognize Motion fileContent IntroductionSensorFusionMotionConclusion&FutureWorksReferencesLiteratur 1.Introduction Camera LiDAR LiDARCalculation

Re-Fusing Refuge

Fusing semantic data

Fusing Crowd Density Maps and Visual Object Trackers for People Tracking in Crowd Scenesvisal.cs.cityu.edu.hk/static/pubs/conf/cvpr18-fusionCNN.pdf · 2018. 7. 3. · effectively

Self-Fusing Insulation Tape - TNB.COMtnblnx3.tnb.com/emAlbum/albums/Shrink-Kon/shrink_kon_g_eng.pdf · Self-Fusing Insulation Tape 23 T&B Self-Fusing Insulation Tape is a: •Modified

Powder Spray Fusing

to recognize static and dynamic identity

Report About Fusing Glass

10 Fusing Machines E

ADESIVATRICI FUSING LINE

by - Philips Lighting · Solite Glass Lens OPTIONS EM Emergency Remote2 FZ120 120 Fusing FZ277 277 Fusing FZ347 347 Fusing DX1 Dimming, Advance Mark X, …

The Cultural Context1. recognize that cultures are dynamic, fluid, and not static entities; ... important to understand that cultures of people are not static but, rather, dynamic

Fusing Machine

Fusing - Art glass › Content › documents › Fusing...fusing and silver clays. Suitable for beginners, the FireBox 8 plugs into any standard 20 Amp household outlet. Includes digital

SDM-BSM: A fusing depth scheme for human action recognitionrobotics.pkusz.edu.cn/static/papers/ICIP2015-LuTian.pdf · They provide additional body shape which can be utilized to recover