bidirectional recurrent convolutional networks for video ...on- bidirectional recurrent...

Download Bidirectional Recurrent Convolutional Networks for Video ...on- Bidirectional Recurrent Convolutional

Post on 09-May-2020

2 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

  • Bidirectional Recurrent Convolutional Networks for Video Super-Resolution

    Qi Zhang & Yan Huang

    May 10, 2017

    Center for Research on Intelligent Perception and Computing (CRIPAC) National Laboratory of Pattern Recognition (NLPR)

    Institute of Automation, Chinese Academy of Sciences (CASIA)

  • CRIPAC

    CRIPAC mainly focuses on the following research topics

    related

    to national public security.

    • Biometrics

    • Image and Video Analysis

    • Big Data and Multi-modal Computing

    • Content Security and Authentication

    • Sensing and Information Acquisition

    CRAPIC receives regular fundings from various

    Government departments or agencies. It is also

    supported by funds of R&D projects from many other

    national and international sources.

    CRIPAC members publish widely in leading national and international journals

    and conferences such as IEEE Transactions on PAMI, IEEE Transactions on

    Image Processing, International Journal of Computer Vision, Pattern

    Recognition, Pattern Recognition Letters, ICCV, ECCV, CVPR, ACCV, ICPR, ICIP, etc.

    http://cripac.ia.ac.cn/en/EN/volumn/home.shtml

    2

    http://www.springer.com/computer/image+processing/journal/11263

  • NVAIL

    Artificial Intelligence Laboratory

    Researches on artificial intelligence and deep learning

    3

  • Outline

    4

    2 Recurrent Convolutional Networks

    4 Future Work

    1 Deep Learning

    3 Application to Video Super-Resolution

  • Outline

    5

    2 Recurrent Convolutional Networks

    4 Future Work

    1 Deep Learning

    3 Application to Video Super-Resolution

  • Deep Neural Networks (DNN)

    6

    • Originate from: - 1962 – simple/complex cell, Hubel and Wiesel - 1970 – efficient error backpropagation, Linnainmaa - 1979 – deep neocognitron, convolution, Fukushima - 1987 – autoencoder, Ballard - 1989 – backpropagation for CNN, Lecun - 1991 – fundamental deep learning problem, Hochreiter - 1991 – deep recurrent neural network, Schmidhuber - 1997 – supervised LSTM RNN, Schmidhuber

    Large numbers of parameters → High computational cost

    Small training set → Over-fitting problem

    • Two drawbacks:

  • Two Recent Developments

    7

    Big Data

    10000 13000 18200

    27300

    54600

    87360

    0

    10000

    20000

    30000

    40000

    50000

    60000

    70000

    80000

    90000

    100000

    2009 2010 2011 2012 2013 2014

    Video surveillance data size (PB) Cheap Computation

    DNN can thus be fitted efficiently

  • Breakthrough in 2006 ImageNet: 74% vs. 85%

    Deep Learning:The Resurgence of DNN

    8

    2006 2012 2014

    Activity recognition, CVPR2015

    Video caption, CVPR2015

    RNN for sequence analysis

    Representation learning

    DeepFace, CVPR2014

    RCNN for detection, CVPR2014

    ∙∙∙∙∙∙

    CNN for visual tasks

    Deep Learning promotes the fast development of various visual computing areas

  • Outline

    9

    2 Recurrent Convolutional Networks

    4 Future Work

    1 Deep Learning

    3 Application to Video Super-Resolution

  • Deep Neural Networks (DNN)

    10

    𝐱

    𝐡

    𝐲

    𝐖

    • 𝐱 ∈ ℝ𝑑 , 𝐡 ∈ ℝ𝑛, 𝐖 ∈ ℝ𝑑×𝑛

    • 𝐡 = 𝜎 𝐱𝐖 , 𝜎 𝑡 = 1

    1+𝑒−𝑡

    Sigmoid function 𝜎 𝑡

  • Recurrent Neural Networks (RNN)

    11

    𝐱𝟏

    𝐡𝟏

    𝐖

    𝐱𝟐

    𝐡𝟐

    𝐖

    𝐱𝟑

    𝐡𝟑

    𝐖

    𝐔 𝐔

    • 𝐡𝒕 = 𝜎 𝐱𝒕𝐖+𝐡𝒕−𝟏𝐔

    • 𝐱𝐭 ∈ ℝ 𝑑 , 𝐡𝐭 ∈ ℝ

    𝑛, 𝐖 ∈ ℝ𝑑×𝑛, 𝐔 ∈ ℝ𝑛×𝑛

    𝐲

    𝐱

    𝐡

    𝐖

    𝐲

    RNNDNN

    Temporal dependency modeling

  • Recurrent Convolutional Networks (RCN)

    12

    DNN

    Convolutional

    CNN

    Sequential

    RNN

    Sequential

    Convolutional

    RCN

    DNN: Deep Neural Networks RNN: Recurrent Neural Networks CNN: Convolutional Neural Networks

  • Applications of RCN

    13

    Scene Labeling, NIPS15

    Action Recognition, ICLR15

    Video SR, NIPS15 & TPAMI17 Weather Nowcasting, NIPS15

    Person ReID, CVPR16Object Recognition, CVPR15

  • Outline

    14

    2 Recurrent Convolutional Networks

    4 Future Work

    1 Deep Learning

    3 Application to Video Super-Resolution

  • Video Super-Resolution

    15

    A great need for super resolving low-resolution videos

    High-resolution devices

    Display

    Low-resolution videos

    High-resolution videos

    Super-resolution: denoising, deblurring, upscaling

    Display

  • 16

    1. Single-Image super-resolution [1-6]

    Two Main Approaches (1/2)

    One-to-One scheme, super resolve each video frame independently

    Ignore the intrinsic temporal dependency relation of video frames

    [1] Dong et al., Learning a deep convolutional network for image super resolution. ECCV, 2014. [2] Timofte et al., Anchored neighborhood regression for fast example-based super resolution. ICCV, 2013. [3] Zeyde et al., On single image scale-up using sparse-representations. Curves and Surfaces, 2012. [4] Yang et al., Image super-resolution via sparse representation. IEEE TIP, 2010. [5] Bevilacqua et al., Low-complexity single-image super resolution. BMVC, 2012. [6] Chang et al., Super-resolution through neighbor embedding. CVPR, 2004.

    Low computational complexity, fast

  • 17

    Two Main Approaches (2/2)

    [7] Liu and Sun, On bayesian adaptive video super resolution. IEEE PAMI, 2014. [8] Takeda et al., Super-resolution without explicit subpixel motion estimation. IEEE TIP, 2009. [9] Mitzel et al., Video super resolution using duality based tv-l 1 optical flow. PR, 2009. [10] Protter et al. Generalizing the nonlocal-means to super-resolution reconstruction. IEEE TIP, 2009. [11] Fransens et al., Optical flow based super-resolution: A probabilistic approach. CVIU, 2007.

    2. Multi-Frame super-resolution [7-11]

    Model the temporal dependency relation by motion estimation

    Many-to-One scheme, use multiple adjacent frames to super resolve a frame

    High computational complexity, slow

  • Motivation

    18

    • RNN can model long-term contextual information of temporal sequences well

    • Convolutional operation can scale to full videos of any spatial size and temporal step

    ➢ Propose bidirectional recurrent convolutional networks, different from vanilla RNN:

    RNN: Recurrent Neural Networks SR: Super-Resolution

    1. Commonly-used full connections are replaced with weight -sharing convolutions

    2. Conditional convolutions are added for learning visual-temporal dependency relation

  • Bidirectional Recurrent Convolutional Networks

    19

    learn spatial dependency between a low-resolution frame and its high- resolution result

    model long-term temporal dependency relation across video frames

    enhance visual-temporal dependency relation modeling

  • Learning

    20

    • Define an end-to-end mapping 𝑂 ∙ from low-resolution frames 𝒳 to high-resolution frames 𝒴

    • Learning proceeds by optimizing the Mean Square Error (MSE) between predicted frames 𝑂(𝒳) and 𝒴

    𝐿 = 𝑂 𝒳 −𝒴 2

    – stochastic gradient descent

    – small learning rate in the output layer: 1e-4

  • Experiments

    21

    • Train the model on 25 YUV format video sequences

    – volume-based training

    – number of volumes: roughly 41,000

    – volume size: 32 × 32 × 10

    • Test on a variety of real world videos

    – severe motion blur

    – motion aliasing

    – complex motions

    Training videos

    Testing videos

  • PSNR Comparison

    22

    Table1: The results of PSNR (dB) and test time (sec) on the test video sequences.

    PSNR: peak signal-to-noise ratio

    [1] Video enhancer. http://www.infognition.com/videoenhancer/, version 1.9.10. 2014. [4] Bevilacqua et al., Low-complexity single-image super resolution. BMVC, 2012. [5] Chang et al., Super-resolution through neighbor embedding. CVPR, 2004. [6] Dong et al., Learning a deep convolutional network for image super resolution. ECCV, 2014. [20] Takeda et al., Super-resolution without explicit subpixel motion estimation. IEEE TIP, 2009. [22] Timofte et al., Anchored neighborhood regression for fast example-based super resolution. ICCV, 2013. [24] Yang et al., Image super-resolution via sparse representation. IEEE TIP, 2010. [25] Zeyde et al., On single image scale-up using sparse-representations. Curves and Surfaces, 2012.

    Surpass state-of-the-art methods in PSNR, due to the effective temporal dependency modelling

  • • Investigate the impact of our model architecture on the performance

    • Ta

View more >