automatic generation of stereogram by using stripe backgroundsyamashita/paper/b/b018final.pdf ·...

Automatic Generation of Stereogram by Using Stripe Backgrounds

Atsushi Yamashita†, Akihisa Nishimoto†, Toru Kaneko† and Kenjiro T. Miura†† Department of Mechanical Engineering, Shizuoka University3–5–1 Johoku, Hamamatsu–shi, Shoizuoka 432–8561, Japan

voice: [+81](53)478–1067; fax: [+81](53)478–1067e·mail: [email protected]

www: http://sensor.eng.shizuoka.ac.jp/~yamasita

Abstract

In this paper, we propose a new method that cangenerate stereograms (stereoscopic photographs)automatically by compositing foreground imagesand background images with chromakey using atwo-tone stripe background. As to the image com-position of foreground images and background im-ages, a chromakey is usually used. However, theconventional chromakey uses the unicolored bluebackground, and has a problem that one’s clothesare regarded as the background if their colors aresame. Therefore, we utilize the adjacency condi-tion between two-tone striped areas on the back-ground, and extract the foreground regions whosecolors are same with the background. In addition,the stereogram consists of images for a left eyeand those for a right eye. The relationships be-tween foreground objects and background objectsin composite images must be consistent. The rela-tionships of objects between the left image and theright one must be also consistent. Therefore, theforeground images and the background images aremerged by considering the geometric constraints.

Keywords: Image composition, stereogram,chromakey, region extraction

1 Introduction

In this paper, we propose a new method that cangenerate stereograms (stereoscopic photographs)automatically by compositing foreground imagesand background images with chromakey using atwo-tone stripe background.

The composition of images [1] has significantconsequences to creative designs such as cinema

films, magazine covers, promotion videos, and soon. This technique can combine images of actorsor actresses in a studio and those of scenery thatis taken in another place. Additionally, it becomesvery important to generate stereograms automati-cally for virtual realties in recent years. The stere-ogram consists of images for a left eye and thosefor a right eye [2]. Although the images themselvesgive just a bi-dimensional (x, y) representation, wecan get the third dimension (depth z) from leftand right images. It is because there is a disparitybetween left and right images and men or womencan feel the depth from images that have properdisparities.

Therefore, when we generate stereograms, wehave to solve the following problems:

1. how to extract foreground objects.

2. how to composite foreground object imagesand background scenery images.

As to the former problem about the extractionof the foreground objects, there are a lot of stud-ies about image segmentation [3, 4], e.g. pixel-based segmentation, area-based one, edge-basedone, and physics-based one. For example, thesnake [5] is a popular technique based on the edgedetection. However, the automatic methods forthe arbitrary background have not been developedto a practical level. Additionally, robust methodsare needed especially for a live program on TV[6, 7].

Therefore, an image composition method calledchromakey that can segment human objects froma uniform color background and superimposethem with another background electronically hasbeen proposed. This technique has been usedfor many years in the TV and the film industry.

Color 1

Color 1

Color 1

Color 2

(a) (b)

Color 1Color 2

Color 2Color 1

(c)

Figure 1: Region extraction with chromakey tech-niques using several types of the backgrounds. (a)Unicolor background. (b) Multi background [8].(c) Striped background (Our proposed method).

The conventional chromakey technique uses back-ground whose color is solid blue or green, and ex-tracts the region in images whose colors are notsame with the background. Although this canbe successfully used, several problems associatedwith this method still need a solution.

One of the most critical and unsolved prob-lems is that if one’s clothes and the backgroundhave the same color, the former is regarded asthe latter (Figure 1(a)). Smith and Blinn pro-pose the blue screen matting method that allowsthe foreground object to be shot against two back-ing colors [8] (Figure 1(b)). This method can ex-tract the foreground region whose colors are samewith the background color. However, this multi-background technique cannot be used for live ac-tors or moving objects because of the requirementfor repeatability.

In this paper, we propose a new method thatmakes it possible to segment objects from a back-ground precisely even if objects have the samecolor as the background, by using two-tone stripedbackground (Figure 1(c)). In our method, the ad-jacency condition between two-tone striped areason the background is utilized to extract the fore-ground regions whose colors are same with thebackground [9].

As to the latter problem about the image com-position of foreground object images and back-ground scenery images, the geometric constraintsbetween them must be considered. The relation-

(a) (b)

(c) (d)

Figure 2: Image composition of foreground ob-jects and background scenery. (a) Left imagewith size strangeness. (b) Right image withsize strangeness. (c) Left image with dispar-ity strangeness. (d) Right image with disparitystrangeness.

ships between foreground objects and backgroundobjects in composite images must be consistent.When the size of the foreground object and that ofthe background object is not consistent, the com-posite images looks unnatural (Figure 2(a)(b)).The relationships of objects between the left imageand the right one must be also consistent. Whenthe disparity is incorrect, the composite left andright images looks strange (Figure 2(c)(d)).

Therefore, the geometric constraints are consid-ered by utilizing the relationship between char-acteristic points between left images and rightimages, and between the foreground images andbackground images, respectively.

2 Stereogram Generation

The color I(u, v) of a compositing image at a pixel(u, v) is defined as:

I(u, v) = α(u, v)F (u, v)+(1− α(u, v))B(u, v), (1)

where F (u, v) and B(u, v) are the foreground andthe background color, respectively, and α(u, v) isthe alpha key value at a pixel (u, v) [1].

The color at a pixel (u, v) is same with the fore-ground when α(u, v) equals to 1, and that is same

with the background when α(u, v) equals to 0. Inother words, the region extraction with a chro-makey technique is the problem of the alpha keyextraction at each pixel (u, v).

The alpha key value is ordinary estimated bya color space approach , e.g. [10] and an im-age space approach, e.g. [5]. However, the fore-ground cannot be extracted by only using thesetwo approaches when the colors of foreground ob-jects are same with that of the background. Thesame is true for the two-tone striped backgroundsthat we propose. Therefore, the boundaries wherethe color of the striped background changes areutilized for detecting the foreground region whosecolor is same as the background [9].

In our method, a baseline stereo camera thatconsists of a left camera and a right camera is usedfor taking the foreground left and right images andthe background left and right images, respectively.

The procedure of our proposed method consistsof four steps: 1) background color extraction, 2)striped region extraction, 3) foreground extrac-tion, and 4) image composition.

2.1 Background Color Extraction

Candidate regions of the background are extractedby using a color space approach. Let C1 and C2 bethe colors of the two-tone background in acquiredimages from the camera, and R1 and R2 be theregion whose color is C1 and C2, respectively. Ri

(i = 1, 2) is defined as:

Ri = {(u, v)|F (u, v) ∈ Ci}, (2)

where F (u, v) is the color of an image at a pixel(u, v).

After the human operator decide C1 and C2

by the trial-and-error method, R1 and R2 are ex-tracted, respectively (Figure 3(a)(b)).

2.2 Striped Region Extraction

The background mainly consists of the three re-gions: R1, R2, and R3. R3 is the intermediateregion between R1 and R2 (Figure 3(b)). It isdifficult to extract R3 by a color space approachbecause the color of R3 may be a composite ofC1 and C2 and not either C1 or C2. Therefore,R3 is extracted from adjacency conditions in ourmethod.

C1C2

R1R2R4R2

R4

R1

R3

R4

(a) (b) (c)

R4R5

R4

R4R6

(d) (e) (f)

Figure 3: Overview of stereogram generation. (a)Original image. (b) Region segmentation with-out using the information of the striped area. (c)Foreground extraction without using the informa-tion of the striped area. (d) Region segmentationby using the information of the striped area. (e)Foreground extraction by using the information ofthe striped area. (f) Image composition of left andright images.

The difference between R3 and the foregroundis that R3 contacts with both R1 and R2. In thecase of horizontal stripe, the color of the upperand lower region of R3 differs from each other.Therefore, R3 is extracted by searching the areabetween R1 and R2 in the longitudinal direction.

After searching R3, the foreground region R4 isdetermined as follows:

R4 = {(u, v) | F (u, v) /∈ (C1 ∪ C2),(u, v) /∈ R3}. (3)

In this procedure, all regions whose colors aresame as the background are extracted as the back-ground. An example of extraction error is shownin Figure 3(c). They must be corrected in the nextstep.

2.3 Foreground Extraction

The boundary between the foreground and thebackground is detected to recheck the foregroundregion whose color is same as the background, andthe background region whose color is same as the

R1

R2

R2

R1

R3

R3

(R6)

R1

R2

R2

R4

R3

R3

R2

R4

(R5)

Figure 4: Boundary detection of foreground.

foreground. Let R5 be the region whose color issame as the background and is inside the fore-ground, and R6 be on the surface of the foreground(Figure 3(d)). To detect R5 and R6, the adjacencyconditions with R3 are utilized (Figure 4).

The difference between R1,2 and R5 is whetherit contacts with R3 or not, while the color ofR1,2 and R5 is same. This is because there ex-ists no two-tone striped region whose colors aresame as the background inside the foreground ob-jects. Therefore, the regions that do not contactwith R3 among R1 and R2 can be judged as R5.

To determine R6 region, endpoints of R3 areutilized. R3 regions become horizontal lines if thestripes and the scanlines of the camera are per-fectly horizontal. However, this does not alwaysoccur. Therefore, the approximate line of eachR3,j region is calculated. After that, the endpointsof the approximate lines are detected. The neigh-bor areas of the endpoints that are adjacent to R6

have same color as C1 or C2, although the fore-ground next to the endpoints has different colorfrom C1 or C2. After judging the endpoints nextto R1 or R2, these points are connected smoothlywith interpolation lines. To detect the boundaryof R6, each pixel which is on the interpolation lineis regarded as R4. As the result of these proce-dure, the region that is surrounded by the inter-polation lines and R4 becomes the same region asR5 which do not contact with R3. Therefore, R6

can be obtained by the same way of detecting R5.On the other hand, the endpoints that do not

contact with R1 and R2 are not connected with in-terpolation lines, and the boundaries of R4 remainprecisely. In this way, the distinction between R4

region and R6 region can be easily executed.After that, the image is divided into six regions

(Figure 3(d)) and the extraction procedure fin-ishes. Accordingly, the foreground region of theimage Rfg is obtained as follows (Figure 3(e)):

Rfg = {(u, v)|(u, v) ∈ (R4 ∪R5 ∪R6)}. (4)

2.4 Image Composition

The conditions when the images are taken by thebaseline stereo camera are estimated from the im-ages to generate stereograms naturally. The op-tical axes of two cameras are parallel. There-fore, the 3-D coordinate of corresponding pointsbetween two images in the camera coordinate(X,Y, Z) is expressed as follows:

X =b(xi,l + xi,r)

2d, (5)

Y =b(yi,l + yi,r)

2d, (6)

Z =bf

di, (7)

where (xi,l, yi,l) and (xi,r, yi,r) is the coordinatevalue of the characteristic point i on the left andright images, respectively, f is the image distance,b is the length of baseline (the distance betweentwo cameras), and d = xi,l − xi,r is the disparityof the corresponding point i.

When compositing natural images that haveproper disparities, the parameters f and b mustbe estimated. However, it is very difficult to es-timate these parameters with no information ofimages. Therefore, the human operator gives in-formation about the distance between two char-acteristic points in left and right images in ourmethod. Two characteristic points are selected inconsideration of the ease for the estimation fromthe general knowledge, e.g, body heights of hu-man, the height of house, and so on.

f and b can be calculated in the following equa-tion when more than three corresponding pointsare selected and the distance between these pointsare estimated by the human operator.

b2{(xi,l + xi,r

2di− xj,l + xj,r

2dj

)2

+(yi,l + yi,r

2di− yj,l + yj,r

2dj

)2

+f2( 1di− 1

dj

)2}= L2

i,j , (8)

where Li,j is the estimated distance between twopoints i and j in the world coordinate.

After estimating b and f , the 3-D reconstructionof the scenery in images can be executed. There-fore, the foreground objects and the backgroundscenery can be merged naturally by consideringthe geometric constraints between left and rightimages, and the foreground and background im-ages, respectively.

3 Experiments

In our experiment, the colors of the backgroundwere set blue and yellow. In Figure 5, a sheet ofblue paper that is same paper as the background isrolled around the left arm of the foreground personand a sheet of yellow paper is put on his shirt toexamine the case of the same color clothes (Figure5(a)). The result of the background color extrac-tion and the detection of the endpoints of R3 areshown in Figure 5(b) and (c), respectively. The ex-tracted foreground is shown in Figure 5(d). Thisresult shows that the foreground regions whosecolors are same with the background are extractedwithout fail. The result of the image composi-tion of the extracted foreground and another back-ground is shown in Figure 5(e).

Figure 6 shows the results of automatically gen-erated stereograms. Figure 6(a)(b) show originalleft and right background images. After estimat-ing the parameters when the background imagesare taken, horizontal planes of floor can be calcu-lated (Figure 6(c)(d)). The results of image com-position (Figure 6(e)(f)) look natural. The effec-tiveness of our method has been verified by com-paring the images in which a man actually stoodon the same place as the composite images (Fig-ure 6(g)(h)). The composite left and right imagesare very similar to the real left and right ones,respectively.

From these experimental results, the effective-ness of our proposed method has been verified.

4 Conclusions

In this paper, we propose a new method thatcan generate stereograms automatically with achromakey technique using two-tone striped back-grounds. We utilize the adjacency condition be-tween two-tone striped areas on the background,

and extract the foreground regions whose colorsare same with the background.

The geometric constraints are considered byutilizing the relationship between characteristicpoints between left images and right images, andbetween the foreground images and backgroundimages, respectively.

Same colorwith C1 or C2

R4

R6

R5

R3

(a) (b)

Endpoint of R3

R3Interpolating line

(c) (d)

(e)

Figure 5: Result of region extraction II. (a) Orig-inal image. (b) Background color extraction. (c)Detection of R6. (d) Foreground extraction. (e)Image composition.

Experimental results show that the foregroundregion is constantly extracted from the back-ground despite of the colors of the foreground ob-

jects, and the stereo pair images after compositionare very natural that have proper disparities.

As the future work, we have to improve the pro-cessing speed to treat with moving picture in real-time.

Acknowledgement

This work was partly funded from Hoso-BunkaFoundation, Inc., Japan.

References

[1] Thomas Porter and Tom Duff: “CompositingDigital Images,” Computer Graphics (Pro-ceedings of SIGGRAPH84), Vol.18, No.3,pp.253–259, (1984).

[2] Richard I. Land and Ivan E. Sutherland:“Real-Time, Color, Stereo, Computer Dis-plays,” Applied Optics, Vol.8, No.3, pp.721–723, (1969).

[3] King-Sun Fu and Jack Kin Yee Mui: “A Sur-vey on Image Segmentation,” Pattern Recog-nition, Vol.13, pp.3–16, (1981).

[4] Wladyslaw Skarbek and Andreas Koschan:“Colour Image Segmentation - A Survey,”Technical Report 94–32, Technical Universityof Berlin, Department of Computer Science,(1994).

[5] Michael Kass, Andrew Witkin and DemetriTerzopoulos: “SNAKES: Active ContourModels,” International Journal of ComputerVision, Vol.1, pp.321–331, (1988).

[6] Simon Gibbs, Constantin Arapis, ChristianBreiteneder, Vali Lalioti, Sina Mostafawyand Josef Speier: “Virtual Studios: AnOverview,” IEEE Multimedia, Vol.5, No.1,pp.18–35, (1998).

[7] Andrew Wojdala: “Challenges of Virtual SetTechnology,” IEEE Multimedia, Vol.5, No.1,pp.50–57, (1998).

[8] Alvy Ray Smith and James F. Blinn:“Blue Screen Matting,” Computer Graphics(Proceedings of SIGGRAPH96), pp.259–268,(1996).

[9] Atsushi Yamashita, Toru Kaneko, ShinyaMatsushita and Kenjiro T. Miura: “Re-gion Extraction with Chromakey UsingStriped Backgrounds,” Proceedings of IAPRWorkshop on Machine Vision Applications(MVA2002), pp.455–458, (2002).

[10] Yasushi Mishima: “A Software ChromakeyerUsing Polyhedric Slice,” Proceedings ofNICOGRAPH1992, pp.44–52, (1992).

(a) (b)

(c) (d)

(e) (f)

(g) (h)

Figure 6: Result of stereogram. (a) Left back-ground image. (b) Right background image. (c)Calculated horizontal plane of floor on the left im-age. (d) Calculated horizontal plane of floor on theright image. (e) Result of left image composition.(f) Result of right image composition. (g) Actualleft image. (h) Actual right image.

automatic generation of stereogram by using stripe backgroundsyamashita/paper/b/b018final.pdf ·...

Documents