determining camera response functions from … manders, steve mann university of toronto dept. of...

4
DETERMINING CAMERA RESPONSE FUNCTIONS FROM COMPARAGRAMS OF IMAGES WITH THEIR RAW DATAFILE COUNTERPARTS Corey Manders, Steve Mann University of Toronto Dept. of Electrical and Computer Engineering 10 King’s College Rd. Toronto, Canada ABSTRACT Many digital cameras now have an option of raw data output. We first show, by way of superposigrams, that this raw data output is quantimetrically (i.e. in terms of the camera’s response to light) linear, in the case of the Nikon D2H digital SLR camera. Next, we perform comparametric analysis on compressed images together with their corresponding raw data images in order to determine the camera’s response function. 1. INTRODUCTION: TYPICAL CAMERAS AND TRADITIONAL IMAGE PROCESSING Most cameras do not provide an output that varies linearly with light input. Instead, most cameras contain a dynamic range com- pressor, as illustrated in Fig. 1. Historically, the dynamic range Fig. 1: Typical camera and display: light from subject matter passes through lens (typically approximated with simple algebraic projective geometry, e.g. an idealized “pinhole”) and is quantified in units “q” by a sensor array where noise nq is also added, to produce an output which is compressed in dynamic range by a typically unknown function f . Further noise n f is introduced by the camera electronics, in- cluding quantization noise if the camera is a digital camera and compression noise if the camera produces a compressed output such as a jpeg image, giving rise to an out- put image f1(x, y). The apparatus that converts light rays into f1(x, y) is labelled CAMERA. The image f1 is transmitted or recorded and played back into a DISPLAY system where the dynamic range is expanded again. Most cathode ray tubes exhibit a nonlinear response to voltage, and this nonlinear response is the expander. The block labelled “expander” is therefore not usually a separate device. Typical print media also exhibit a nonlinear response that embodies an implicit “expander”. compressor in video cameras arose because it was found that tele- visions did not produce a linear response to the video signal. In particular, it was found that early cathode ray screens provided a light output approximately equal to voltage raised to the exponent of 2.5. Rather than build a circuit into every television to com- pensate for this nonlinearity, a partial compensation (exponent of 1/2.22) was introduced into the television camera at much lesser cost since there were far more televisions than television cameras in those days. Coincidentally, the logarithmic response of human visual per- ception is approximately the same as the inverse of the response of a television tube (e.g. human visual response turns out to be ap- proximately the same as the response of the television camera) [1]. For this reason, processing done on typical video signals will be on a perceptually relevant tone scale. Moreover, any quantization on such a video signal (e.g. quantization into 8 bits) will be close to ideal in the sense that each step of the quantizer will have associ- ated with it a roughly equal perceptual change in perceptual units. Most still cameras also provide dynamic range compression built into the camera. For example, the Nikon D2h camera captures internally in 12 bits (per pixel per color) and then applies dynamic range compression, and finally outputs the range–compressed im- ages in 8 bits (per pixel per color). Fortunately, the Nikon D2h camera also allows output of images in a non–range–compressed 12-bit (per pixel per color) format. 1.1. Why Stockham was wrong When video signals are processed using linear filters, there is an implicit homomorphic filtering operation on the photoquantity (a measure of the quantity of light present on a sensor array ele- ment [2]). As should be evident from Fig. 1, operations of storage, transmission, and image processing take place between approxi- mately reciprocal nonlinear functions of dynamic range compres- sion and dynamic range expansion. Many users of image processing methodology are unaware of this fact, because there is a common misconception that cameras produce a linear output, and that displays respond linearly. In fact there is a common misconception that nonlinearities in cameras and displays arise from defects and poor quality circuits, when in actual fact these nonlinearities are fortuitously present in display media and deliberately present in most cameras. Thus the effect of processing signals such as f1 in Fig. 1 with linear filtering is, whether one is aware of it or not, homomorphic filtering. Stock- ham advocated a kind of homomorphic filtering operation in which the logarithm of the input image was taken, followed by linear fil- tering (e.g. linear space invariant filters), followed by taking the antilogarithm [3]. In essence, what Stockham didn’t appear to realize, is that such homomorphic filtering is already manifest in simply doing ordinary linear filtering on ordinary picture signals (whether from video, film, or otherwise). In particular, the compressor gives an image f1 = f (q)= q 1/2.22 = q 0.45 (ignoring noise nq and n f ) which has the approximate effect of f1 = f (q)= log(q + 1) (e.g. roughly the same shape of curve, and roughly the same ef- fect, e.g. to brighten the mid–tones of the image prior to process-

Upload: buinguyet

Post on 29-Apr-2018

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: DETERMINING CAMERA RESPONSE FUNCTIONS FROM … Manders, Steve Mann University of Toronto Dept. of Electrical and Computer Engineering 10 King’s College Rd. Toronto, Canada ... A

DETERMINING CAMERA RESPONSE FUNCTIONS FROM COMPARAGRAMS OFIMAGES WITH THEIR RAW DATAFILE COUNTERPARTS

Corey Manders, Steve Mann

University of TorontoDept. of Electrical and Computer Engineering

10 King’s College Rd.Toronto, Canada

ABSTRACTMany digital cameras now have an option of raw data output. Wefirst show, by way of superposigrams, that this raw data output isquantimetrically (i.e. in terms of the camera’s response to light)linear, in the case of the Nikon D2H digital SLR camera. Next, weperform comparametric analysis on compressed images togetherwith their corresponding raw data images in order to determinethe camera’s response function.

1. INTRODUCTION: TYPICAL CAMERAS ANDTRADITIONAL IMAGE PROCESSING

Most cameras do not provide an output that varies linearly withlight input. Instead, most cameras contain a dynamic range com-pressor, as illustrated in Fig. 1. Historically, the dynamic range

Fig. 1: Typical camera and display: light from subject matter passes through lens(typically approximated with simple algebraic projective geometry, e.g. an idealized“pinhole”) and is quantified in units “q” by a sensor array where noise nq is alsoadded, to produce an output which is compressed in dynamic range by a typicallyunknown function f . Further noise nf is introduced by the camera electronics, in-cluding quantization noise if the camera is a digital camera and compression noise ifthe camera produces a compressed output such as a jpeg image, giving rise to an out-put image f1(x, y). The apparatus that converts light rays into f1(x, y) is labelledCAMERA. The image f1 is transmitted or recorded and played back into a DISPLAYsystem where the dynamic range is expanded again. Most cathode ray tubes exhibit anonlinear response to voltage, and this nonlinear response is the expander. The blocklabelled “expander” is therefore not usually a separate device. Typical print mediaalso exhibit a nonlinear response that embodies an implicit “expander”.

compressor in video cameras arose because it was found that tele-visions did not produce a linear response to the video signal. Inparticular, it was found that early cathode ray screens provided alight output approximately equal to voltage raised to the exponentof 2.5. Rather than build a circuit into every television to com-pensate for this nonlinearity, a partial compensation (exponent of1/2.22) was introduced into the television camera at much lessercost since there were far more televisions than television camerasin those days.

Coincidentally, the logarithmic response of human visual per-ception is approximately the same as the inverse of the response

of a television tube (e.g. human visual response turns out to be ap-proximately the same as the response of the television camera) [1].For this reason, processing done on typical video signals will be ona perceptually relevant tone scale. Moreover, any quantization onsuch a video signal (e.g. quantization into 8 bits) will be close toideal in the sense that each step of the quantizer will have associ-ated with it a roughly equal perceptual change in perceptual units.

Most still cameras also provide dynamic range compressionbuilt into the camera. For example, the Nikon D2h camera capturesinternally in 12 bits (per pixel per color) and then applies dynamicrange compression, and finally outputs the range–compressed im-ages in 8 bits (per pixel per color). Fortunately, the Nikon D2hcamera also allows output of images in a non–range–compressed12-bit (per pixel per color) format.

1.1. Why Stockham was wrong

When video signals are processed using linear filters, there is animplicit homomorphic filtering operation on the photoquantity (ameasure of the quantity of light present on a sensor array ele-ment [2]). As should be evident from Fig. 1, operations of storage,transmission, and image processing take place between approxi-mately reciprocal nonlinear functions of dynamic range compres-sion and dynamic range expansion.

Many users of image processing methodology are unaware ofthis fact, because there is a common misconception that camerasproduce a linear output, and that displays respond linearly. In factthere is a common misconception that nonlinearities in camerasand displays arise from defects and poor quality circuits, when inactual fact these nonlinearities are fortuitously present in displaymedia and deliberately present in most cameras. Thus the effectof processing signals such as f1 in Fig. 1 with linear filtering is,whether one is aware of it or not, homomorphic filtering. Stock-ham advocated a kind of homomorphic filtering operation in whichthe logarithm of the input image was taken, followed by linear fil-tering (e.g. linear space invariant filters), followed by taking theantilogarithm [3].

In essence, what Stockham didn’t appear to realize, is thatsuch homomorphic filtering is already manifest in simply doingordinary linear filtering on ordinary picture signals (whether fromvideo, film, or otherwise). In particular, the compressor gives animage f1 = f(q) = q1/2.22 = q0.45 (ignoring noise nq and nf )which has the approximate effect of f1 = f(q) = log(q + 1)(e.g. roughly the same shape of curve, and roughly the same ef-fect, e.g. to brighten the mid–tones of the image prior to process-

Page 2: DETERMINING CAMERA RESPONSE FUNCTIONS FROM … Manders, Steve Mann University of Toronto Dept. of Electrical and Computer Engineering 10 King’s College Rd. Toronto, Canada ... A

ing). Similarly a typical video display has the effect of undoing(approximately) this compression, e.g. darkening the mid–tonesof the image after processing with q̂ = ˜f−1(f1) = f2.5

1 . Thusin some sense what Stockham did, without really realizing it, wasto apply dynamic range compression to already range compressedimages, then do linear filtering, then apply dynamic range expan-sion to images being fed to already expansive display media.

1.2. On the value of doing the exact opposite of what Stock-ham advocated

There exist certain kinds of image processing for which it is prefer-able to operate linearly on the photoquantity q. Such operationsinclude sharpening of an image to undo the effect of the pointspread function (PSF) blur of a lens, or to increase the camera’sgain retroactively. We may also add two or more differently il-luminated images of the same subject matter if the processing isdone in photoquantities. What is needed in these forms of pho-toquantigraphic image processing is an anti–homomorphic filter.The manner in which an anti–homomorphic filter is inserted intothe image processing path is shown in Fig. 2.

Fig. 2: The anti–homomorphic filter: Two new elements f̂−1 and f̂ have beeninserted, as compared to Fig. 1. These are estimates of the the inverse and forwardnonlinear response function of the camera. Estimates are required because the ex-act nonlinear response of a camera is generally not part of the camera specifications.(Many camera vendors do not even disclose this information if asked.) Because ofnoise in the signal f1 , and also because of noise in the estimate of the camera non-linearity f , what we have at the output of f̂−1 is not q, but, rather, an estimate, q̃.This signal is processed using linear filtering, and then the processed result is passedthrough the estimated camera response function, f̂ , which returns it to a compressedtone scale suitable for viewing on a typical television, computer, or the like, or forfurther processing.

Previous work has dealt with the insertion of an anti–homomorphicfilter in the image processing chain. However, in the case of us-ing a camera in which the raw 12-bit data is available, processingusing the raw data (NEF files), may proceed as shown in figure 3.

Fig. 3: A modified method of photoquantimetric image processing (shown in fig-ure 2), in which the raw data is available, and consequently no anti–homomorphicfilter is necessary. Moreover, a comparison (e.g. comparametric analysis) betweencompressed and raw data is possible.

2. A SIMPLE CAMERA MODEL

While the geometric calibration of cameras is widely practiced andunderstood [4], often much less attention is given to the camera’s

quantimetric (the neither radiometric nor photometric manner inwhich the camera responds to light [5, 6, 7]) response function.In digital cameras, the camera response function maps the actualquantity of light impinging on each element of the sensor array tothe pixel values that the camera outputs.

Linearity (which is typically not exhibited by most camera re-sponse functions) implies the following two conditions:

1. Homogeneity: A function is said to exhibit homogeneity ifand only if f(ax) = af(x), for all scalar a.

2. Superposition: A function is said to exhibit superposition ifand only if f(x + y) = f(x) + f(y).

In image processing, homogeneity arises when we comparedifferently exposed pictures of the same subject matter. Super-position arises when we superimpose (superpose) pictures takenfrom differently illuminated instances of the same subject matter,using a simple law of composition such as addition (i.e. using theproperty that light is additive).

A variety of techniques have been proposed to recover cam-era response functions, such as using test patterns of known re-flectance, and using different exposures of the same subject mat-ter [5][6][7]. Recently, a method using a superposigram [8] wasused. The method differed from other methods in that it did notrequire the use of test patterns, nor a camera that was capable ofadjusting its exposure.

The following technique is used: in a dark environment, set uptwo distinct light sources. Take three pictures, one with each lighton individually (pa, pb), and one with the two lights on together(pc).

3. THE SUPERPOSIGRAM AND LINEARITY

From the three images taken in the method described, we may forma superposigram. To do this, each pixel location is considered inthe three images. Note that this may be done using both raw datafiles and range compressed files (such as PPM or JPEG) which areavailable from virtually all digital cameras. For each pixel locationthere exists three values, one value from each of the three images.If the range of the data is relatively low (such as x ∈ [0, 255], x ∈N in the case of typical pixels), a three dimensional array may beused to store the data as a superposigram. To do this, the array isinitialized to all 0. For each pixel position, the three values fromthe three images become an index into the three dimensional array.The bin corresponding to this index is incremented. The procedureis repeated for each pixel position. This yields a superposigram.

The situation becomes slightly more complicated when deal-ing with raw data. If the complete superposigram structure wereto be constructed as a typical array, at least 128 gigabytes wouldbe needed for storage. Of course, much of the array will remainas zero after the superposigram has been constructed. For this rea-son, superposigram data was stored in point–dictionary form as(x, y, x + y,count). To efficiently store this data, a structure simi-lar to a hash table was used. The C code used to perform this taskis available at http://comparametric.sourceforge.net, and is freelydistributable under the GNU license.

If the camera response of the raw data is truly linear, then thethird axis should be the summation of points on first and secondaxis. That is to say, the superposigram should define a plane of theform:

q1 + q2 − q1+2 = 0. (1)

Page 3: DETERMINING CAMERA RESPONSE FUNCTIONS FROM … Manders, Steve Mann University of Toronto Dept. of Electrical and Computer Engineering 10 King’s College Rd. Toronto, Canada ... A

Where q is the photoquantimetric value recorded from the rawdata of the sensor. The superposigram resulting from this situationis shown in figure 4.

Fig. 4: A superposigram from the raw data of a Nikon D2H digital SLR camera. Thepoints which exhibit clipping (the data value has reached its maximum possible value)have been removed to simplify the plot and more clearly demonstrate the linearity ofthe data. The data used to from the superposigram is shown in figure 5.

4. THE SUPERPOSIGRAM AND TYPICAL CAMERARESPONSE FUNCTIONS

Unlike the linear response present in the raw data of the NikonD2H camera, the data typically which is available as jpegs fromcameras is non-linear. This is immediately apparent in viewing thesuperposigram constructed using the JPEG data from a camera.Unlike the plane shown in figure 4, the superposigram is a convexsurface, as shown in figure 6.

Though the superposigram may be used to solve for the re-sponse function, as shown in [8], if the raw data is available (asis the case with the Nikon D2H), the response function is easilyfound by noticing that the comparagram[9][5] between the raw lin-ear data and the range compressed data (such as the decompressedJPEG images) is the camera response function. One expects thenon-linearity when working with pixels from a JPEG or PPM im-age. In particular, if the pixels of a typical image were doubled,we do not expect the same result as doubling the exposure time ofthe image of increasing the fstop by a factor of

√2.

5. CALCULATING THE RESPONSE FUNCTION ANDDETERMINING ERROR

As mentioned, the comparagram between the raw data and therange compressed data is the response function of camera. In de-tail, the following may be done: an array of dimensions 4096×256is created and initialized to 0. The dimensions are such becausethe raw data from the camera is 12 bits per pixel whereas the range

Fig. 5: One of the many datasets used in the computation of the superposigram. Left-most: Picture in Deconism Gallery with only the upper lights turned on. Middle:Picture with only the lower lights turned on. Rightmost: Picture with both the upperand lower lights turned on together.

Fig. 6: A superposigram from the non-linear, range compressed data from the JPEGoutput. The data used to from the superposigram is shown in figure 5. As expected,the superposigram is a convex surface rather than a plane.

compressed data is 1 byte. Each pixel position on the two imagebecomes a coordinate into the array. For each pixel position, thecorresponding element is incremented. In the case of a Nikon D2Hdigital camera, the result is sown in figure 7.

To simplify the computation of the response function, the 12-bit data may be reduced to 8-bit data by dividing by 16 and round-ing to retain the linearity of the data. The comparagram proceduremay once again be repeated (this time with a 256 × 256 array), toproduce a simplified version of the response function. A very goodapproximation to the response function may be found by compos-ing a discrete function from the maximum bin counts across therows of the comparagram. This simplified comparagram alongwith the resulting discrete function is shown in figure 8.

Computed Comparagram with Camera Response Function

50 100 150 200 250

50

100

150

200

250

Fig. 8: A range-reduced comparagram of the raw camera data against the range com-pressed data, with the discrete camera response function plotted as the maximal bincounts of the comparagram.

5.1. Confirming the correctness of the camera response func-tion by homogeneity

The first measure described is termed a homogeneity-test of thecamera response function (regardless of how it was obtained). Thehomogeneity-test requires two differently (by a scalar factor of k)exposed pictures, f(q) and f(kq), of the same subject matter.

Page 4: DETERMINING CAMERA RESPONSE FUNCTIONS FROM … Manders, Steve Mann University of Toronto Dept. of Electrical and Computer Engineering 10 King’s College Rd. Toronto, Canada ... A

Comparagram of Raw and Range Compressed Data

500 1000 1500 2000 2500 3000 3500 4000

250

Fig. 7: 256 by 4096 comparagram

Method used to determine Super- Homo-the response function position geneity

Error ErrorHomogeneity with parametric solutionDirect from Raw Data 7.2018 8.1201(Previous Work [9]) 8.8096 9.9827Homogeneity, direct solution 8.6751 9.4011Superposition, direct solution 8.5450 9.5361

Table 1: This table shows the per-pixel errors observed in using lookup tables aris-ing from several methods of calculating f and f−1 . The leftmost column denotesthe method used to determine the response function. The middle column denoteshow well the resulting response function superimposes images, based on testing thecandidate response function on pictures of subject matter taken under different light-ing positions. The rightmost column denotes how well the resulting response functionamplitude-scales images, and was determined based on using differently exposed pic-tures of the same subject matter. The entries in the rightmost two columns are meansquared error divided by the number of pixels in an image.

To conduct the test, the dark image f(q) is lightened, and thentested to see how close it is (in the mean squared error sense) tof(kq). The mean-squared difference is termed the homogeneityerror. To lighten the dark image, it is first converted from im-agespace f to lightspace, q, by computing f−1(f(q)). Then thephotoquantities q are multiplied by a constant value, k. Finally,we convert it back to imagespace, by applying f . Alternatively wecould apply f−1 to both images and multiply the first by k andcompare them in lightspace (as photoquantities).

5.2. Confirming the correctness of the camera response func-tion by superposition

Another test of a camera response function termed the superposition-test, requires three pictures pa = f(qa), pb = f(qb) and pc =f(qa+b). The inverse response function is applied to pa and pb

and the resulting photoquantities qa and qb are added. We nowcompare this sum (in either imagespace or lightspace) with pc (orqc). The resulting mean squared difference is the superpositionerror.

5.3. Comparing homogeneity and superposition errors in re-sponse functions found by each of various methods

The results of comparison of homogeneity and superposition er-rors in response functions found by various methods (includingprevious published work) are compared in Table 1. As expected,the direct method using the raw data produces the lowest error.Note however that the error is not 0 due to the noise imposed pri-marily by the lossy compression of JPEG data.

6. ACKNOWLEDGMENTS

The authors would like to thank Nikon Camera for their variousdonations of digital cameras, lenses, and funding.

7. CONCLUSION

In this paper we showed how an unknown nonlinear camera re-sponse function can be recovered using homogeneity and/or su-perposition properties of light. The easiest method to implement,which also gives rise to the lowest error (as evaluated for bothhomogeneity and superposition) was to simply compute a com-paragram between a range compressed image and its raw datafilecounterpart, available on many cameras. Rather than using testcharts, or minimizing a sum of squares error resulting from thecamera’s non-linearity, the method relied on the comparagram, avery simple data structure presented in earlier work, to solve forthe function directly. The method was may also be used as a base-line for other methods which solve for the response function indi-rectly when raw data is also available.

8. REFERENCES

[1] Charles Poynton, A technical introduction to digital video, John Wiley& Sons, 1996.

[2] Steve Mann, Intelligent Image Processing, John Wiley and Sons,November 2 2001, ISBN: 0-471-40637-6.

[3] T. G. Stockham, Jr., “Image processing in the context of a visualmodel,” Proc. IEEE, vol. 60, no. 7, pp. 828–842, July 1972.

[4] E. Trucco and A. Verri, Introductory Techniques for 3-D ComputerVision, Prentice Hall, NJ, 1998.

[5] F. M. Candocia, “A least squares approach for the joint do-main and range registration of images,” IEEE ICASSP,vol. IV, pp. 3237–3240, May 13-17 2002, avail. athttp://iul.eng.fiu.edu/candocia/Publications/Publications.htm.

[6] S. Mann and R. Mann, “Quantigraphic imaging: Estimating the cam-era response and exposures from differently exposed images,” CVPR,pp. 842–849, December 11-13 2001.

[7] S. Mann, “Compositing multiple pictures of the same scene,” inProceedings of the 46th Annual IS&T Conference, Cambridge, Mas-sachusetts, May 9-14 1993, The Society of Imaging Science and Tech-nology, pp. 50–52, ISBN: 0-89208-171-6.

[8] C. Aimone C. Manders and S. Mann, “Camera response function re-covery from different illuminations of identical subject matter,” IEEEICIP 2004, p. to be published, 2004.

[9] S. Mann, “Comparametric equations with practical applications inquantigraphic image processing,” IEEE Trans. Image Proc., vol. 9,no. 8, pp. 1389–1406, August 2000, ISSN 1057-7149.