real-time neural style transfer for...

Our style transfer model consists of two parts: a stylizing network and a loss network. The stylizing network takes one frame as input and produces its corresponding stylized output. The loss network, pre- trained on the ImageNet classification task, defines a hybrid loss function including spatial and temporal components. During the training process, two consecutive input frames −1 and are fed to stylizing network creating two consecutive output frames ො −1 and ො . Then the two consecutive output frames are fed to the loss network to compute the hybrid loss. Specifically, the spatial loss is computed separately for each of the two consecutive output frames and the temporal loss is computed based on both of them. After training, as the stylizing network has already incorporated temporal consistency, we can apply the network frame by frame to an arbitrary input video to generate temporally coherent stylized video. Some results for different styles are shown below. Real-Time Neural Style Transfer for Videos Haozhi Huang 1,2 , Hao Wang 1 , Wenhan Luo 1 , Lin Ma 1 , Wenhao Jiang 1 , Xiaolong Zhu 1 , Zhifeng Li 1 , Wei Liu 1 Tencent AI Lab 1 Tsinghua University 2 Correspondence: [email protected] [email protected] In this paper, we explore the possibility of exploiting a feed-forward neural network to perform style transfer for videos and simultaneously maintain temporal consistency among stylized video frames. Our key contributions are: • A novel real-time style transfer method for videos is proposed, which is solely based on a feed-forward convolutional neural network and avoids computing optical flows on the fly. • We demonstrate that a feed-forward convolutional neural network supervised by a hybrid loss can not only stylize each video frame well, but also maintain the temporal consistency, which is empowered by a proposed two-frame synergic training method. Iterative image style transfer [1] ● The first method using deep neural network for image style transfer. ■ Based on a very slow iterative optimization. Feed-forward image style transfer [2] ● No more iterative optimization process. Fast. ■ No temporal consistency. Iterative video style transfer [3] ● Incorporate temporal consistency. ■ Based on a very slow iterative optimization. Two-Frame Synergic Training without temporal Consistency [2] with temporal consistency Comparisons References [1] L. A. Gatys, A. S. Ecker, and M. Bethge. Image style transfer using convolutional neural networks. In Proc. CVPR, 2016. [2] J. Johnson, A. Alahi, and L. Fei-Fei. Perceptual losses for real-time style transfer and super-resolution. In Proc. ECCV, 2016. [3] M. Ruder, A. Dosovitskiy, and T. Brox. Artistic style transfer for videos. In Proceedings of German Conference on Pattern Recognition, 2016. Comparison to previous methods in the literature. The error maps show the pixel-wise color difference between corresponding pixels of consecutive frames, where the whiter the color is, the larger the error is. Introduction Previous Methods Proposed Ruder’s [3] Johnson’s [2] Comparison to commercial softwares. Check the zoomed areas between consecutive stylized frames. The results suggest that the temporal consistency of our results are better. Proposed Artisto Prisma

Upload: others

Post on 20-Jul-2020

4 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: Real-Time Neural Style Transfer for Videosopenaccess.thecvf.com/content_cvpr_2017/poster/0252_POSTER.pdf · Artisto Prisma. Title: Your title here: Maybe add some pictures and/or

Our style transfer model consists of two parts: a

stylizing network and a loss network. The stylizing

network takes one frame as input and produces its

corresponding stylized output. The loss network, pre-

trained on the ImageNet classification task, defines a

hybrid loss function including spatial and temporal

components.

During the training process, two consecutive input

frames 𝑥𝑡−1 and 𝑥𝑡 are fed to stylizing network

creating two consecutive output frames ො𝑥𝑡−1 and ො𝑥𝑡.Then the two consecutive output frames are fed to the

loss network to compute the hybrid loss. Specifically,

the spatial loss is computed separately for each of the

two consecutive output frames and the temporal loss

is computed based on both of them.

After training, as the stylizing network has already

incorporated temporal consistency, we can apply the

network frame by frame to an arbitrary input video to

generate temporally coherent stylized video.

Some results for different styles are shown below.

Real-Time Neural Style

Transfer for Videos

Haozhi Huang1,2, Hao Wang1,

Wenhan Luo1, Lin Ma1, Wenhao Jiang1,

Xiaolong Zhu1, Zhifeng Li1, Wei Liu1

Tencent AI Lab1 Tsinghua University2

Correspondence:

[email protected]

In this paper, we explore the possibility of exploiting a

feed-forward neural network to perform style transfer

for videos and simultaneously maintain temporal

consistency among stylized video frames.

Our key contributions are:

• A novel real-time style transfer method for videos is

proposed, which is solely based on a feed-forward

convolutional neural network and avoids computing

optical flows on the fly.

• We demonstrate that a feed-forward convolutional

neural network supervised by a hybrid loss can not

only stylize each video frame well, but also maintain

the temporal consistency, which is empowered by a

proposed two-frame synergic training method.

Iterative image style transfer[1]

● The first method using deep neural network for

image style transfer.

■ Based on a very slow iterative optimization.

Feed-forward image style transfer[2]

● No more iterative optimization process. Fast.

■ No temporal consistency.

Iterative video style transfer[3]

● Incorporate temporal consistency.

■ Based on a very slow iterative optimization.

Two-Frame Synergic Training

without temporal

Consistency[2]

with temporal

consistency

Comparisons

References[1] L. A. Gatys, A. S. Ecker, and M. Bethge. Image style transfer using

convolutional neural networks. In Proc. CVPR, 2016.

[2] J. Johnson, A. Alahi, and L. Fei-Fei. Perceptual losses for real-time

style transfer and super-resolution. In Proc. ECCV, 2016.

[3] M. Ruder, A. Dosovitskiy, and T. Brox. Artistic style transfer for

videos. In Proceedings of German Conference on Pattern Recognition,

2016.

Comparison to previous methods in the literature.

The error maps show the pixel-wise color difference between

corresponding pixels of consecutive frames, where the whiter

the color is, the larger the error is.

Introduction

Previous Methods

Proposed

Ruder’s[3]

Johnson’s[2]

Comparison to commercial softwares.

Check the zoomed areas between consecutive stylized

frames. The results suggest that the temporal consistency of

our results are better.

Proposed

Artisto

Prisma

Joint Graph Decomposition & Node Labeling: Problem, Algorithms ...openaccess.thecvf.com/content_cvpr_2017/supplemental/Levinkov_J… · Joint Graph Decomposition & Node Labeling:

Artisto: опыт запуска нейросетей в production / Эдуард Тянтов (Mail.ru Group)

Supplementary – Creativity: Generating Diverse …openaccess.thecvf.com/content_cvpr_2017/supplemental/...Supplementary – Creativity: Generating Diverse Questions using Variational

ESPERANTSKÁ LITERATURAesperantobrno.cz/malnova/Literatura.pdf · MORTO De ARTiSTO LAz10 AHA JOSIP PLEADIN DE VERDA PL(UMO LEKSIKONO PRI ESPERANTLINGVAJ VERKISTOJ 20 BACEV, Marin

Fotografía de página completa - Ioulia Akhmadeeva€¦ · ción de conocimiento relativa al libro de artisto, el diseño editorial alternativo, las autoediciones y los estUdios

Temporal Context Network for Activity Localization in Videosopenaccess.thecvf.com/content_ICCV_2017/papers/Dai... · 2017-10-20 · Temporal Context Network for Activity Localization

decembro 2009 - Wild Thoughtswildthoughts.com.au/images/mirmekobo7.pdf · nia artisto Julian Beever.....14 anekdoto Diablo kaj Sankta Savao, de Jadranka Miric.....23 solvo de la enigmoJ

The railroad at a glance - ModelRailroaderVideoPlus.com My... · Motive power: Artisto-Craft GP40, battery-powered Length of mainline: 120' approx Maximum gradient: 2.5% Type of track:

Imagine This! Scripts to Compositions to Videosopenaccess.thecvf.com/.../Tanmay_Gupta_Imagine_This... · Tanmay Gupta1[0000−0001−6551−7455] ... 2 T. Gupta et al. 1 Introduction

¿Cua´leselsigniﬁcado delacatenaria?€¦ · Artisto´teles Para comprender esto, examina al-gunos problemas simples de la cons-de Brunelleschi para los avances poste- no existe

Detect-and-Track: Efficient Pose Estimation in Videosopenaccess.thecvf.com/content_cvpr_2018/papers/Girdhar_Detect-a… · faster than IP based formulations [26] using state-of-the-art

The user is always wrong! by Krishnaa Artisto

End-to-End Camera Calibration for Broadcast Videosopenaccess.thecvf.com/content_CVPR_2020/papers/Sha_End... · 2020. 6. 29. · rent work focuses on sports where lines and intersections

Cross-Task Weakly Supervised Learning From Instructional Videosopenaccess.thecvf.com/content_CVPR_2019/papers/Zhukov... · 2019. 6. 10. · Cross-task weakly supervised learning from

LAEO-Net: Revisiting People Looking at Each Other in Videosopenaccess.thecvf.com/content_CVPR_2019/papers/... · teractions in TV material, and we demonstrate this for two episodes

GEROSIOS PATIRTIES KORTELĖ Gimtosios kalbos ugdymas · personažais formuoja mažojo artisto pasaulėžiūrą, vertybes. Autorius arba autorių grupė Dalia Balsienė Pareigos Ikimokyklinio

Learning Shape Abstractions by Assembling Volumetric ...openaccess.thecvf.com/content_cvpr_2017/papers/Tulsiani_Learning_S… · Learning Shape Abstractions by Assembling Volumetric

3.imimg.com3.imimg.com/data3/RH/VF/MY-4166165/cl_deyash.pdf · ARTISTO COLLECTION The styling of ARTISTO is no gimmick, it follows an internal logic which continues in throughout

Grassmannian Manifold Optimization Assisted Sparse ...openaccess.thecvf.com/content_cvpr_2017/papers/... · Grassmannian Manifold Optimization Assisted Sparse Spectral ... optimize

MarioQA: Answering Questions by Watching Gameplay Videosopenaccess.thecvf.com/content_ICCV_2017/papers/Mun_Mario... · 2017-10-20 · MarioQA: Answering Questions by Watching Gameplay

ActionVLAD: Learning Spatio-Temporal Aggregation for ...openaccess.thecvf.com/content_cvpr_2017/.../Girdhar_ActionVLAD_Learning... · ActionVLAD: Learning spatio-temporal aggregation

Plug & Play Generative Networks: Conditional …openaccess.thecvf.com/content_cvpr_2017/papers/Nguyen_Plug__Play...Plug & Play Generative Networks: Conditional Iterative Generation

ŭgeno PROLOGO · 2020. 4. 23. · La intema titolbildo estas de- segnita de 16- jara artisto. ... kaj en sonanta preĝ-incenso •knaĝas rememoroj-vibr’. Voznesensk, 5 Decembro

Forecasting Interactive Dynamics of Pedestrians With ...openaccess.thecvf.com/content_cvpr_2017/papers/Ma_Forecasting_Interactive_Dynamics...Forecasting Interactive Dynamics of Pedestrians

Adversarial Discriminative Domain Adaptation - CVF Open Accessopenaccess.thecvf.com/content_cvpr_2017/papers/Tzeng_Adversarial... · Adversarial Discriminative Domain Adaptation Eric

Gaze Prediction in Dynamic 360deg Immersive Videosopenaccess.thecvf.com/.../papers/Xu_Gaze_Prediction... · of our method for gaze prediction in dynamic VR scenes. 1. Introduction

Artisto App, Highload 2016

FML: Face Model Learning From Videosopenaccess.thecvf.com/content_CVPR_2019/papers/Tewari... · 2019-06-10 · FML: Face Model Learning from Videos Ayush Tewari1 Florian Bernard1

Demo2Vec: Reasoning Object Affordances from Online Videosopenaccess.thecvf.com/content_cvpr_2018/CameraReady/1387.pdf · 2018-06-05 · Demo2Vec: Reasoning Object Affordances from

Predicting Salient Face in Multiple-Face Videosopenaccess.thecvf.com/content_cvpr_2017/papers/Liu... · 2017-05-31 · Table1.VideocategoriesinMUFVET-IandMUFVET-II. Category TVplay/movie

Dress Like a Star: Retrieving Fashion Products From Videosopenaccess.thecvf.com/content_ICCV_2017_workshops/papers/w32/... · Dress like a Star: Retrieving Fashion Products from Videos

Real-World Anomaly Detection in Surveillance Videosopenaccess.thecvf.com/content_cvpr_2018/papers/Sultani_Real-World... · video surveillance applications, there are several attempts

Unsupervised Video Summarization With Adversarial LSTM ...openaccess.thecvf.com/content_cvpr_2017/papers/Mahasseni_Unsupervised... · novel generative adversarial framework, consisting

Demo2Vec: Reasoning Object Affordances From Online Videosopenaccess.thecvf.com/.../Fang_Demo2Vec_Reasoning... · Demo2Vec: Reasoning Object Affordances from Online Videos Kuan Fang∗

Deep Affordance-Grounded Sensorimotor Object Recognitionopenaccess.thecvf.com/content_cvpr_2017/papers/... · Deep Affordance-grounded Sensorimotor Object Recognition ... a small