supplementary material of ”iterative residual cnns for...

Supplementary Material of ”Iterative Residual CNNs for Burst PhotographyApplications”

1. HyperparametersIn Table 1 we provide the values of all the hyper-

parameters that we used to train our models. The Prox-Net was pre-trained on the single frame Gaussian denoisingtask using the Berkeley segmentation dataset (BSDS) [9],which consists of 500 color images. All the images wererandomly cropped into patches of size 180 × 180 pixels.The patches were perturbed with noise of standard devia-tion σ ∈ [0, 15] and the network was optimized to minimizethe Mean Square Error. While pre-training of the denoiser isnot necessary in our approach, we experimentally found outthat a pre-trained denoiser typically speeds up the trainingtime of our Iterative Neural Network. This can be attributedto the fact that a pre-trained denoiser can serve as a properinitialization for the proximal network (ProxNet).

2. Computational CostRepresentative execution times of our burst image demo-

saicking approach for images of various sizes is provided inFig. 2. These results refer to the average execution timeof 10 runs on an NVIDIA TitanX. From this figure is clearthat increasing the number of frames inside the burst yieldsonly a small increase in the computational cost. The reasonis that our proximal network accepts as an input a singleimage and not the entire sequence of the burst. The burstframes are only used to compute the gradient of the data fi-delity term, which is then combined with the output of theproximal network from the previous iteration to form theinput for the current iteration. The gradient of the data fi-delity term can be efficiently computed in parallel and, thus,an increase of the number of frames does not affect linearlythe computation time. On the contrary, the networks intro-duced in [1, 4] are applied separately on each frame of theburst and thus the involved computation cost increases lin-early with the size of the burst.

3. Failure CasesThe main limitation of our network lies in its dependency

on the ECC estimation of the warping matrix [2]. This es-timation method can be rather inaccurate especially whenthere is a strong presence of noise. Therefore, when the

Ground Truth Input Ours

Figure 1: Failure cases of our approach on burst Gaussiandenoising where the homography estimation between theburst frames is imprecise.

estimated homography matrix is imprecise, our network in-evitably will introduce ghosting artifacts to the final result,such as those depicted in Fig. 1. Indeed, in these exam-ples, while we observe that the images have been denoisedsuccessfully, due to the imprecise warping estimation, falseedges appear in the restored images.

4. Additional Burst Denoising Results

In Fig. 4, we provide additional qualitative comparisonsfor the burst Gaussian denoising task. From these results weclearly observe that our denoising approach yields superiorrestorations, where fine details have been retained while thenoise has been efficiently suppressed. We note that this isnot the case for ResDNet and VBM4D, where the resultseither suffer from the appearance of noise artifacts or finedetails have been entirely wiped out.

Burst Gaussian Denoising Burst Noise-free Demosaicking Burst DemosaickingInitial Learning Rate 0.01 0.003 0.003

Iterations K 10 10 10TBPTT k 5 5 5Batch Size 10 10 10

Noise Estimation True False* Truesmax ln(2) ln(15) ln(2)smin ln(1) ln(1) ln(1)

Epochs 300 300 300

Table 1: Hyper-parameters used during training for each trained model.∗For the case of noise-free demosaicking we considerthe presence of low-level noise of fixed standard deviation equal to σ = 1.

2 4 6 8 10 12 14 16Number of Frames

0.5

1.0

1.5

2.0

2.5

3.0

Tim

e (s

ec)

0.25 Mpix 1 Mpix2 MPix

Figure 2: Execution time of burst demosaicking for 0.25,1 and 2 megapixel images versus the number of used burstframes. Increasing the number of frames yields only a mi-nor increase in computational complexity.

5. Calculation of the Lipschitz ConstantFor any pair of variables z, x ∈ RN we have

‖∇xf(x)−∇zf(z)‖X =

=

∥∥∥∥∥ 1

σ2B

B∑i=1

STi H

T(HSix− y)− STi H

T(HSiz− y)

∥∥∥∥∥2

=1

σ2B

∥∥∥∥∥B∑i=1

STi H

THSix− STi H

THSiz

∥∥∥∥∥2

≤ 1

σ2B

B∑i=1

∥∥STi H

THSix− STi H

THSiz∥∥2

=1

σ2B

B∑i=1

∥∥STi H

THSi(x− z)∥∥2

≤ 1

σ2B

B∑i=1

∥∥STi H

THSi∥∥ ‖x− z‖2

≤ 1

σ2‖x− z‖2 .

Note that, the first two inequalities follow from the rela-

tion between the norms defined in the RN space. To com-pute an upper bound of the spectral norm

∥∥STi H

THSi∥∥, we

exploit the fact that both matrices H and S have boundedvalues between 0 and 1. The aforementioned fact holdssince the affine transformation matrix Si contains the inter-polation coefficients for the weighted average of the pixelintensities and by definition these coefficients sum up toone. Simultaneously, for the problems that we consider, thematrix H is either identity for the burst denoising problemor binary diagonal matrix for the burst demosaicking prob-lem. This immediately implies that

∥∥STi H

THSi∥∥ ≤ 1 and,

hence, an upper bound of the Lipschitz constant of∇xf(x)

will be L(x) ≤ 1σ2B

∑Bi=1

∥∥STi H

THSi∥∥ ≤ 1/σ2.

6. Additional FlexISP resultsIn Table 2, we provide the PSNR scores for the syn-

thetic bursts of the FlexISP dataset. Our iterative networkoutperforms all previous approaches and the improvementin PSNR ranges from 1.2 to 3 dBs. The same conclusionabout the superiority of our joint denoising-demosaickingapproach, can also be confirmed visually by referring to theresults provided in Fig. 3.

References[1] M. Aittala and F. Durand. Burst image deblurring using per-

mutation invariant convolutional neural networks. In The Eu-ropean Conference on Computer Vision (ECCV), September2018.

[2] G. D. Evangelidis and E. Z. Psarakis. Parametric image align-ment using enhanced correlation coefficient maximization.IEEE Transactions on Pattern Analysis and Machine Intelli-gence, 30(10):1858–1865, Oct 2008.

[3] M. Gharbi, G. Chaurasia, S. Paris, and F. Durand. DeepJoint Demosaicking and Denoising. ACM Trans. Graph.,35(6):191:1–191:12, Nov. 2016.

[4] C. Godard, K. Matzen, and M. Uyttendaele. Deep burst de-noising. In The European Conference on Computer Vision(ECCV), September 2018.

[5] S. W. Hasinoff, D. Sharlet, R. Geiss, A. Adams, J. T. Bar-ron, F. Kainz, J. Chen, and M. Levoy. Burst photography for

Ground Truth Input Ours HDR+ [5] FlexISP [6] Godard [4] Gharbi [3] Kokkinos [7]

Figure 3: Burst demosaicking results on a real and two synthetic bursts from the FlexISP dataset [6]. Our model successfullyrestores the missing colors of the underlying images while suppressing noise.

Single Frame Multi FrameImage Input Gharbi Kokkinos FlexISP ProxImal HDR+ Godard Ours

Flickr Doll 20.25 29.61 28.33 29.41 30.23 26.52 29.39 31.22Kodak Fence 26.28 32.82 32.93 34.42 - 30.07 34.08 35.24Living Room 21.74 34.4 34.61 32.27 - 25.79 31.32 37.53

Table 2: Comparisons of several methods for the burst demosaicking task on the synthetic FlexISP dataset.

high dynamic range and low-light imaging on mobile cam-eras. ACM Transactions on Graphics (Proc. SIGGRAPHAsia), 35(6), 2016.

[6] F. Heide, M. Steinberger, Y.-T. Tsai, M. Rouf, D. Pajak,D. Reddy, O. Gallo, J. Liu, W. Heidrich, K. Egiazarian,et al. Flexisp: A flexible camera image processing framework.ACM Transactions on Graphics (TOG), 33(6):231, 2014.

[7] F. Kokkinos and S. Lefkimmiatis. Deep image demosaickingusing a cascade of convolutional residual denoising networks.In The European Conference on Computer Vision (ECCV),September 2018.

[8] M. Maggioni, G. Boracchi, A. Foi, and K. Egiazarian. Videodenoising using separable 4d nonlocal spatiotemporal trans-forms. In Image Processing: Algorithms and Systems IX, vol-ume 7870, page 787003. International Society for Optics andPhotonics, 2011.

[9] D. Martin, C. Fowlkes, D. Tal, and J. Malik. A database of hu-man segmented natural images and its application to evaluat-ing segmentation algorithms and measuring ecological statis-tics. In Proceedings Eighth IEEE International Conferenceon Computer Vision. ICCV 2001, volume 2, pages 416–423vol.2, 2001.

Ground Truth Average Ours ResDNet ResDNet Average VBM4D [8]

Figure 4: Burst Gaussian denoising with σ = 25. Our method is able to effectively restore the images and retain fine details,as opposed to the rest of the methods that over-smooth high texture areas. Results best seen magnified on a computer screen.

supplementary material of ”iterative residual cnns for...

Documents