estimation of ambient light and transmission map with...

Estimation of Ambient Light and Transmission Mapwith Common Convolutional Architecture

Young-Sik Shin∗, Younggun Cho∗, Gaurav Pandey†, Ayoung Kim∗∗Department of Civil and Environmental Engineering, KAIST, S. Koreaemail:{youngsik.shin, yg.cho, ayoungk}@kaist.ac.kr

†Department of Electrical Engineering, IIT, Kanpur, Indiaemail:[email protected]

Abstract—This paper presents a method for effective ambientlight and transmission estimation in underwater images usinga common convolutional network architecture. The estimatedambient light and the transmission map are used to dehaze theunderwater images. Dehazing underwater images is especiallychallenging due to the unknown and significantly varying ambientlight in underwater environments. Unlike common dehazingmethods, the proposed method is capable of estimating ambientlight along with the transmission map thereby improving thereconstruction quality of the dehazed images. We evaluate the de-hazing performance of the proposed method on real underwaterimages and also compare our method to current state-of-the-arttechniques.

I. INTRODUCTION

Capturing high-resolution colored images from underwaterenvironments has many applications in oceans engineering.A good quality image from the deep sea can be very usefulfor scientists studying various underwater phenomena. Despitesignificant advancements in camera technology, high-qualityunderwater image acquisition remains an unsolved problem.The scattering of light from water particles along with theattenuation and change in the color of different wavelengths ofambient light (including external light sources) cause a hazingeffect in captured underwater images, as is shown in Fig. 1(a).This hazing effect needs to be removed from the images sothat a clear picture of the underwater scene can be visualized.

Several methods for haze removal from prior informationhave been proposed in the past [1]–[6]. Schechner [1] tried touse prior information from multiple images taken at differentenvironmental conditions and at different degrees of polar-ization. Narasimhan [2] used the available depth informationto enhance the dehazing performance. Fattal [3] presentedthe first single image dehazing technique using independentcomponent analysis (ICA) to decorrelate the transmission andsurface shading. This technique relies on the assumption thatthe transmission map and the surface shading factor are locallyuncorrelated. He et al. [4] proposed a haze removal methodusing dark channel prior (DCP). This approach uses a strongprior that at least one of the color channels has low intensityin haze-free images. Zhu et al. [5] proposed color attenuationprior (CAP), which uses the fact that the saturation of hazy-pixels in an image becomes much lower than that of thehaze-free pixels. Carlevaris-Bianco et al. [6] used the strongdifference between the red color channel and other channels toestimate the depth of scene from a single underwater image.

(a) Original Image. (b) Dehazed Image.

Fig. 1. Haze removal on underwater images. (a) Original hazed image. (b)the resulting dehazed image.

Recently, convolutional neural network (CNN) presentedpromising solutions to many vision tasks [7]–[12] includingdehazing [13], [14]. Jiaming Mai et al. proposed a backpropagation neural network (BPNN) model to estimate thetransmission map. Cai et al. [14] proposed a CNN architecturecalled DehazeNet for the estimation of the transmission map.A hazy image is provided as input to the convolutionalarchitecture, and a regression model is learned to predictthe transmission map. This transmission map is then usedto remove haze using an atmospheric scattering model. Thisis the work that is the most related to ours; however, theproposed method is different in the sense that we also estimateambient light in addition to the transmission map for a betterreconstruction of the hazy image.

Image dehazing is an effective approach to increase the visi-bility and recover the real radiance of the hazy image. It shouldbe noted that the ambient light in underwater environmentsis significantly biased, and estimating it accurately improvesmany underwater vision applications. However, despite theutility of correctly estimating ambient light, it is usually arbi-trarily selected from the brightest or the median pixel withinthe lower estimated transmission [4]–[6], [15]. Therefore, inthis work we focus on the fast and effective estimation ofboth ambient light and the transmission map. We propose acommon convolution architecture to simultaneously estimateambient light and the transmission map for visibility enhance-ment of the scene in an image degraded by an underwaterenvironment.

The rest of the paper is organized as follows. In section§II, we describe the atmospheric scattering model used in

conv 3x3

conv 5x5

conv 7x7

conv3x3

ReLU

PooL

conv

ReLU

Hazy image

Multi-scale fusion stage Multi-scale feature extraction stage Nonlinear regression

Transmission map or

Ambient Light

maxout

conv3x3

ReLU

Fig. 2. The overall convolutional architecture. The network contains three stages: multi-scale fusion, feature extraction and nonlinear regression.

this work. Section §III describes the proposed convolutionalarchitecture. Section §IV presents the results from simulatedand real data captured underwater. In section §V, we presentour concluding remarks.

II. ATMOSPHERIC SCATTERING MODEL

In this paper, we adopt the haze model as described in [4],which considers the hazed image as a weighted sum of thehaze-free image J and ambient light A. For a pixel (u, v),hazed pixel value I(u, v) is modeled as shown below

I(u, v) = J(u, v)t(u, v) +A(1− t(u, v)), (1)

where I is the observed haze image, J is the scene radiance,A is the global atmospheric light, and t is the transmissionthat reaches the camera without scattering. The transmissionvalue t(u, v) at any pixel exponentially decreases with respectto the distance of the reached light

t(u, v) = e−βd(u,v), (2)

where d(u, v) is the depth of the scene point, and β isthe attenuation coefficient of the medium. If we resolve thetransmission map t and the global atmospheric light A froma given hazed image, the original scene radiance J can berecovered from the model given in (1).

III. CONVOLUTIONAL NEURAL NETWORK (CNN) FORAMBIENT LIGHT AND TRANSMISSION ESTIMATION

A. Model Architecture

The proposed CNN architecture is composed of three stagesof network propagation as shown in Fig. 2. The first stage ofthe network is a multi-scale fusion stage inspired from [16].In this stage, we use an element-wise summation of pixelsfor dimensionality reduction. This allows us to generate morefeature maps in each layer, thereby improving the trainingaccuracy. Since we are estimating both ambient light and thetransmission map from the same architecture, we need to fixthe value of one while training for the other variable. Usingmore feature maps reduces the uncertainty of unknown vari-ables (e.g. transmission in ambient light architecture learning

and vice versa). The final stage consists of a nonlinear regres-sion layer for ambient light and transmission map estimation.

A similar idea was recently proposed in [14], which useda multi-scale mapping layer and maxout operation for haze-relevant feature extraction. However, our model is differentfrom them as we estimate both ambient light and the trans-mission map from the same network architecture.

1) Muti-scale fusion: The first stage in the architectureis the multi-scale fusion layer which has been widely usedfor image enhancement including single image dehazing [14],[16], [17]. We use 3 parallel convolutional layers, where eachconvolutional layer has a filter of the size [3 × 3 × 32],[5 × 5 × 32] and [7 × 7 × 32]. We choose more numberof feature maps than DehazeNet to reduce the uncertainty inthe unknown variables. Moreover, at the end of this stage,we perform an element-wise summation operation whereasDehazeNet only stacks up the multi-scale layers. The summa-tion of multi-scale layers helps to reduce the computationalcomplexity in the later stages.

2) Feature extraction: To handle the ill-posed conditionof single image dehazing, previous methods have assumedvarious features that are closely related with the properties ofhazy images. For example, dark channel, hue disparity and rgbchannel disparity are utilized as haze-relevant features [4]–[6].

Inspired by the previous methods, the second stage isdesigned to extract haze-relevant features. This stage consistof a maxout unit, two convolutional layers with a RectifiedLinear Unit (ReLU) activation function and a max-poolinglayer. The maxout unit [18] is selected to find features in thedepth direction of input data. After the maxout unit, we usetwo convolution layers with a filter of size [3×3×32]. Lastly,the max-pooling layer is chosen to get the spatial invariancefeature. In general, max-pooling layers are used to overcomelocal sensitivity and to reduce the resolution of feature mapsin conventional CNN. In contrast with this, we densely applythe operation to prevent loss of resolution and it achieve thegoal that is the use of CNN for image restoration.

3) Nonlinear regression: The last stage is the non-linearregression layer that performs the estimation of the transmis-sion and ambient light. The convolutional layer that is used in

(a) Original Image. (b) Transmission map. (c) Ambient light. (d) Dehazed Image.

Fig. 3. A process of haze removal on underwater images. (a) the original image, (b) the transmission map and (c) the estimated ambient light. Finally, thedehazed image is recovered in (d).

this stage consists of a single filter of the size [3 × 3 × 32].We have also added the widely used ReLU layer after everyconvolution layer to avoid any problems of slow convergenceand local minima during the training phase [7], [10], [11]).

B. Training of CNN

1) Training data: Training of CNN requires a pair ofhazy patches and corresponding haze-free information (e.g.transmission map and ambient light). Practically, it is verydifficult to obtain such a training dataset via experiments.Therefore, we use the haze model equation in (1) and syn-thesize hazed patches from haze-free image patches to trainour CNN architecture. We use two publicly available datasets,ICL-NUIM [19] and SUN database [20], for training. Weapply random transmission t ∈ (0, 1) and random ambientlight A ∈ (0, 1) on small haze-free image patches, assumingthat transmission and ambient light are locally constant onsmall image patches. Note that we use random ambient lightfor underwater images, which is generally a valid assumptionfor underwater environments. Moreover, the dataset generatedin this manner enables the network to estimate more accuratetransmission on hazy images having color distortions. Thisway, we generate a large number of hazy image patchesfrom haze-free image patches with random transmission andambient light. This training dataset is used to learn the three-stage CNN architecture described above.

2) Training Method: In the proposed model, we use asupervised learning mechanism between hazy image patchesand label data (such as transmission value or ambient lightvalue). Filter weights in the model are learned by minimizingthe loss function. Given a pair of hazy patches from the abovemethod and their corresponding label, we use the generalmean squared error (MSE) as the loss function,

L(Θ) =1

N

N∑i=1

‖F (pi; Θ)− li‖, (3)

where pi is the input hazy patches, li is the label and Θ is thefilter weights. We employ the widely-used stochastic gradientdescent (SGD) algorithm to train our model.

C. Balanced Scene Radiance Recovery

When transmission t(u, v) and atmospheric light A areobtained, original scene radiance J can be recovered from theatmospheric scattering model in eq. (1). It was conventionallyrecovered from the inverse atmospheric scattering model asshown below

J(u, v) =I(u, v)−A

max(t(u, v), t0)+A. (4)

However, this model cannot achieve the recovery of theoriginal scene radiance in an underwater environment. Theattenuation of ambient light in an underwater environment isnot only dependent upon the distance travelled and densityof particles in the path of the light but also depends on thecolor/wavelength of the light. For instance, the light intensityof a red channel rapidly decreases whereas the intensity of blueor green channel decreases slowly. Hence, the ambient lightcomponent in the images captured in underwater environmentsis not the true ambient light, which affects the recovery ofscene radiance from the conventional atmospheric scatteringmodel as shown in (4). Therefore, in order to solve thisproblem, we propose a novel balance scene recovery modelas shown below

J(u, v) =I(u, v)− A

max(t(u, v), t0)︸︷︷︸direct scene radiance

+ Ab︸︷︷︸balanced ambient light

, (5)

where t(u, v) is the estimated transmission value, A is theestimated ambient light using our CNN model and Ab isthe balanced ambient light. The balanced ambient light Abis defined as

Ab = ||A||~ab, (6)

where ~ab is a fixed vector [1/√

3, 1/√

3, 1/√

3] which rep-resent the balanced ambient light in RGB space. Here weassume that the balanced ambient light has same magnitude(||A||) for all three color channels. The proposed dehazingprocess with the image reconstructed using the balanced sceneradiance recovery model is shown in Fig. 3.

0.2 0.4 0.6 0.8

Saturation

-0.2

0

0.2

0.4

0.6

Estim

atio

n er

ror

variancemean

(a) He [4].

0.2 0.4 0.6 0.8

Saturation

-0.2

0

0.2

0.4

0.6

Estim

atio

n er

ror

variancemean

(b) Zhu [5].

0.2 0.4 0.6 0.8

Saturation

-0.2

0

0.2

0.4

0.6

Estim

atio

n er

ror

variancemean

(c) Cai [14].

0.2 0.4 0.6 0.8

Saturation

-0.2

0

0.2

0.4

0.6

Estim

atio

n er

ror

variancemean

(d) Proposed.

0.2 0.4 0.6 0.8

Saturation

-0.2

0

0.2

0.4

0.6

Estim

atio

n er

ror

variancemean

(e) He [4].

0.2 0.4 0.6 0.8

Saturation

-0.2

0

0.2

0.4

0.6

Estim

atio

n er

ror

variancemean

(f) Zhu [5].

0.2 0.4 0.6 0.8

Saturation

-0.2

0

0.2

0.4

0.6

Estim

atio

n er

ror

variancemean

(g) Cai [14].

0.2 0.4 0.6 0.8

Saturation

-0.2

0

0.2

0.4

0.6

Estim

atio

n er

ror

variancemean

(h) Proposed.

Fig. 5. The error statistics according to saturation value of a ambient light on 15K synthetic patches. The red line means the mean value of the estimationerror and gray boundary means the variances. First rows represent the results in balanced ambient light and second rows shows the result in biased ambientlight.

IV. RESULTS

We trained the proposed architecture from about 1 millionsynthetic hazed patches generated from two publically avail-able datasets (ICL-NUIM [19] and SUN database [20]). Amixture of patches from two datasets were used to capture bothindoor (ICL-NUIM) and outdoor (SUN database) scenes.We used the open source Caffe framework [21] to train ourconvolutional network.

We performed several experiments to verify the robustnessof the proposed convolutional architecture. We also comparedthe proposed method with several state-of-the-art methods ofdehazing available in literature [4]–[6], [14]. These algorithmscan be broadly classified into (i) conventional computer visiontechniques that uses prior information [4]–[6] and (ii) CNN-

(a) Balanced ambient light patches (b) Biased ambient light patches

Fig. 4. Two types of synthetic hazy patches. (a) Balanced ambient lightpatches for light condition without color cast (no bias). (b) Biased ambientlight patches for color casted light condition.

TABLE ITRANSMISSION MAP ACCURACY

MSE (×10−2) DCP [4] CAP [5] DehazeNet [14] OursNo color cast 10.1 4.2 2.9 2.6

With color cast 8.8 10.5 9.5 7.9

based dehazing methods as in DehazeNet [14] and proposedmethod.

A. Transmission Map Estimation

In the haze removal process, the transmission estimationaccuracy is the most dominant factor for the dehazing per-formance. In the atmospheric scattering model (1), the trans-mission describes a light portion that reaches the camera.When the light is scattered and the transmission attenuated,haze occurs in an image. For underwater environments thisattenuation occurs under significantly biased ambient light,which produces color saturation in the hazy region. Therefore,transmission map estimation accuracy has significant effect inoriginal scene radiance estimation.

We compare the accuracy of estimated transmission (com-puted from various methods) for 15K sample patches (exclu-sively selected from training sets) under two different hazeconditions. One haze model is without color cast (i.e., no biasin ambient light as in aerial images) and the other is with colorcast in the ambient light (i.e., strong bias in ambient light asin underwater images). We synthetically generated two hazyimage sets (Fig. 4), one with balanced ambient light and theother with biased ambient light.

Transmission map accuracy from different methods is com-

(a) Original Image. (b) Carlevaris-Bianco [6]. (c) He [4].

(d) Zhu [5]. (e) Cai [14]. (f) Proposed.

Fig. 6. The comparison of the estimated transmission map on real underwater image. An original hazy image is shown in (a). Some results show a promisingperformance as in (b), (c) and (f). The others are unsuccessful when estimating transmission in underwater due to high saturation region of the color as shownin (d) and (e).

pared in Table I and Fig. 5. It should be noted that biasedambient light in underwater disrupts the transmission estima-tion for some methods because they only depend on balancedambient light conditions. Table I compares the MSE betweenestimated transmission and ground truth under two differentambient light conditions. Fig. 5 presents error statistics over15K test sample patches. Note that the proposed method showsthe best accuracy in transmission map estimation. Performanceof DehazeNet [14] and CAP [5] depend on the ambient lightcondition and are competent under balanced ambient lightdata but fail under biased ambient light. We also observedthat DCP [4] presents consistent performance regardless ofcolor cast. We think that this is mainly because the biasin the ambient light does not affect the DCP values ofthe local patches. The proposed method outperforms DCPunder balanced ambient light and still robustly estimates thetransmission under biased ambient light.

A summarizing illustration of transmission map estimationfrom different methods is shown in Fig. 6. Feasible trans-mission map estimations are reported in He [4], Carlevaris-Bianco [6] and ours, while the others show insufficient per-formance due to high saturation of color in the water. Asthese methods heavily depend on RGB values, additional colorcorrection is required for performances improvement (e.g,

white balancing [22] and lαβ color correction [23]).

B. Real Underwater Images Dehazing

We applied a trained algorithm on a set of real underwaterimages with different levels of haze. A typical sample ofunderwater images with various color casts were used asshown in Fig. 7. The six test images have various ambientlight conditions.

Fig. 7 shows the dehazing results and the estimated ambientlight for each method. Carlevaris-Bianco [6] represent goodperformance for dehazing and color balance among previ-ously reported methods. This is because it uses the uniqueprior associated with the color-dependent attenuation of lightspecifically in an underwater. He [4] and Zhu [5] enhanced thecontrast of the dehazed images. However, as can be seen inthe ambient light estimation row, the estimated ambient lightis not accurate as they merely compute it from the estimatedtransmission map. Cai [14] particularly fails the recovery ofscene radiance on color casted underwater images because thealgorithm assumes balanced ambient light.

Overall, the proposed method shows reliable performance ofthe proposed dehazing network for underwater images. Notethat in this experiment both transmission map and ambientlight were estimated from proposed common convolutional

Estimated Ambient Light






(a) Original images. (b)Carlevaris-Bianco [6] (c) He [4]. (d) Zhu [5]. (e) Cai [14]. (f) Proposed results.

Fig. 7. Comparison results of dehazing with other methods under various ambient light condition. A small color box represent the estimated ambient lightin each method and the images below the color box show dehazing results in each method. Note that the best performance is shows in (b) and (f) regardlessof ambient light condition.

architecture. These results show good dehazing performancefor underwater environments.

V. CONCLUSION

In this paper, we presented a CNN-based ambient lightand transmission estimation framework with common con-volutional architecture for single image haze removal. Weevaluated the performance of the proposed method with syn-

thetic data and compared it with existing methods. We alsoevaluated the qualitative performance of the proposed methodon some real underwater images. The preliminary results showpromising performance of the dehazing ability of the proposedmethod.

ACKNOWLEDGMENT

This work is supported through a grant from the KAIST viaHigh Rish High Return Project (Award #N11160085) and NRF(Award #N01150984), and Ministry of Land Infrastructure andTransport’s U-city program.

REFERENCES

[1] Y. Schechner and N. Karpel, “Recovery of underwater visibility andstructure by polarization analysis,” Oceanic Engineering, IEEE Journalof, July 2005.

[2] S. G. Narasimhan and S. Nayar, “Interactive deweathering of an imageusing physical models,” in IEEE IEEE Workshop on Color and Photo-metric Methods in Computer Vision, In Conjunction with ICCV, October2003.

[3] R. Fattal, “Single image dehazing,” ACM Transaction on Graphics(TOG), vol. 27, no. 3, pp. 72:1–72:9, Aug. 2008.

[4] K. He, J. Sun, and X. Tang, “Single image haze removal using darkchannel prior,” IEEE Transactions on Pattern Analysis and MachineIntelligence, vol. 33, no. 12, pp. 2341–2353, Dec 2011.

[5] Q. Zhu, J. Mai, and L. Shao, “A fast single image haze removal algorithmusing color attenuation prior,” IEEE Transactions on Image Processing,vol. 24, no. 11, pp. 3522–3533, Nov 2015.

[6] N. Carlevaris-Bianco, A. Mohan, and R. M. Eustice, “Initial results inunderwater single image dehazing,” in Proceedings of the IEEE/MTSOCEANS Conference and Exhibition, Sept 2010, pp. 1–8.

[7] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for imagerecognition,” arXiv preprint arXiv:1512.03385, 2015.

[8] T. Naseer, L. Spinello, W. Burgard, and C. Stachniss, “Robust visualrobot localization across seasons using network flows,” in Proceedingsof the National Conference on Artificial Intelligence (AAAI), 2014.

[9] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich featurehierarchies for accurate object detection and semantic segmentation,” inProceedings of the IEEE Conference on Computer Vision and PatternRecognition, June 2014.

[10] J. Kim, J. K. Lee, and K. M. Lee, “Accurate image super-resolution usingvery deep convolutional networks,” arXiv preprint arXiv:1511.04587,2015.

[11] J. Sun, W. Cao, Z. Xu, and J. Ponce, “Learning a convolutional neuralnetwork for non-uniform motion blur removal,” in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition, June2015.

[12] L. Xu, J. S. Ren, C. Liu, and J. Jia, “Deep convolutional neural networkfor image deconvolution,” in Advances in Neural Information ProcessingSystems, 2014, pp. 1790–1798.

[13] J. Mai, Q. Zhu, D. Wu, Y. Xie, and L. Wang, “Back propagation neuralnetwork dehazing,” in Proc. IEEE Conf. Robotics and Biomimetics, Dec2014, pp. 1433–1438.

[14] B. Cai, X. Xu, K. Jia, C. Qing, and D. Tao, “Dehazenet: Anend-to-end system for single image haze removal,” arXiv preprintarXiv:1601.07661, 2016.

[15] K. Tang, J. Yang, and J. Wang, “Investigating haze-relevant features ina learning framework for image dehazing,” in Proceedings of the IEEEConference on Computer Vision and Pattern Recognition. IEEE, 2014,pp. 2995–3002.

[16] C. Ancuti, C. O. Ancuti, T. Haber, and P. Bekaert, “Enhancing underwa-ter images and videos by fusion,” in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition. IEEE, 2012, pp. 81–88.

[17] Y. Liu, S. Liu, and Z. Wang, “A general framework for image fusionbased on multi-scale transform and sparse representation,” InformationFusion, vol. 24, pp. 147 – 164, 2015.

[18] I. J. Goodfellow, D. Warde-farley, M. Mirza, A. Courville, and Y. Ben-gio, “Maxout networks,” in Proceedings of the 30th InternationalConference on Machine Learning, ICML 2013, Atlanta, GA, USA, 16-21June 2013, 2013, pp. 1319–1327.

[19] A. Handa, T. Whelan, J. McDonald, and A. Davison, “A benchmark forRGB-D visual odometry, 3D reconstruction and SLAM,” in Proceedingsof the IEEE International Conference on Robotics and Automation, HongKong, China, May 2014, pp. 1524–1531.

[20] J. Xiao, J. Hays, K. A. Ehinger, A. Oliva, and A. Torralba, “Sundatabase: Large-scale scene recognition from abbey to zoo,” in Pro-ceedings of the IEEE Conference on Computer Vision and PatternRecognition. IEEE, 2010, pp. 3485–3492.

[21] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick,S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture forfast feature embedding,” in Proc. ACM Conf. on Multimedia. ACM,2014, pp. 675–678.

[22] Y.-C. Liu, W.-H. Chan, and Y.-Q. Chen, “Automatic white balancefor digital still camera,” IEEE Transactions on Consumer Electronics,vol. 41, no. 3, pp. 460–466, 1995.

[23] G. Bianco, M. Muzzupappa, F. Bruno, R. Garcia, and L. Neumann, “anew color correction method for underwater imaging,” The InternationalArchives of Photogrammetry, Remote Sensing and Spatial InformationSciences, vol. 40, no. 5, p. 25, 2015.

estimation of ambient light and transmission map with...

Documents