virtual studios image compositing

The virtual cameramethod presentedhere integratescomputer graphicsand real-time videoprocessingtechniques in a newtype of virtual studio.At NHK wedeveloped severalimage compositingsystems based on thismethod. First we lookat the virtual cameramethod, then someof the systemsdeveloped with it.Then we discussfuture directions andpossibilities not onlyfor imagecompositing but alsofor the developmentof sophisticatedsystems for editingand producingtelevision programs.

Today, actors in television programsand movies commonly appear withina landscape wholly created by com-puter graphics. Computer graphics

evolved for the most part as a computer-orientedtechnology, which makes it possible to createscenes from nothing by numerical calculation.Image compositing technology, on the otherhand, has a different tradition. It evolved in thefield of video production as a technique for syn-thesizing multiple images taken by a real camera.Computer graphics technology has made remark-able progress, and today the computer algorithmsthat produce the final image have reached a veryhigh level of sophistication. By contrast, imagecompositing technology relies on already filmedvideo footage, and the methods that produce thefinal image still depend on expert human skill andexperience.

Technology for integrating computer graphicsand image compositing involves the convergenceof these two cultural traditions, which alreadyhave influenced one another over a long period oftime. The recent popularity of image-based ren-dering and motion capture in computer graphicson the one hand and the virtual studio approachin image compositing on the other hand testify tothis ongoing mutual interaction. However,because convergence has only just begun, a com-prehensive system that integrates computergraphics and image compositing does not yetexist. Efforts to achieve such a system will have to

continue at the academic level as well as the prac-tical level.

Over the last few years, virtual studio technol-ogy has become very popular in television pro-gram production. NHK developed probably thefirst practical virtual studio system1,2 in the worldand in 1991 used it in a science program, “Nano-space” (see Figure 1). Since then we have contin-ued to study the integration of computer graphicsand image compositing systems.

I’ll begin with an overview of image composit-ing, then clarify its direction and research targets.Then, focusing on camerawork-related technolo-gy, I’ll explain the concept of a virtual camerabefore describing several image compositing sys-tems NHK developed that exploit virtual cameratechnology. Finally, I’ll address future issuesregarding other aspects of the system besides cam-erawork.

Image compositingBasically, image compositing involves adjust-

ing and matching various conditions in separate-ly shot or generated images, and synthesizingthese images to create a completely differentscene. The conditions might include such thingsas camerawork, lighting, resolution of pickups,atmosphere, semantic content, and so on. Whichconditions to match for picture synthesis dependson the purpose of the image compositing.

Generally, image compositing is done for thefollowing reasons:

❚ To create a scene in which it appears that thesynthesized objects occupy a particular place(realism).

❚ For artistic effect, often with little or no con-cern for realism (artwork).

Examples of the former include the virtual studioapproach and the image compositing effects seenin so many recent special-effects movies (such asJurassic Park). Examples of the latter encompassvideos created for the title and credit segments ofTV shows, and music promotion videos, whichcombine special video effects with various typesof image components such as camera images, textimages, computer graphics, and so on.

Of these two types of application, the formerprobably lends itself more readily to realization bytechnology. The latter depends more on humanartistic sensibility and therefore would be hard toreduce to a technological process. Yet it would be

36 1070-986X/98/$10.00 © 1998 IEEE

ImageCompositingBased on VirtualCameras

Masaki HayashiNHK Science and Technical Research Laboratories

Virtual Studios

.

of great technological interest to make a machinecapable of automatically generating artistic syn-thesized images based on, for example, a seman-tic understanding of the video content.

Here, we will focus on the pursuit of realism.The objective in this case is achieved by creating anew realistic image out of multiple image compo-nents, matching their attribute conditions as close-ly as possible. Specific types of conditions include

1. Camerawork—pan, tilt, roll, 3D positioning (x,y, z), zoom, focus

2. Lighting—color temperature, intensity, direc-tion of light source, and so forth

3. Filming conditions—resolution of pickups,recording characteristics (for example, videoor film?)

4. Environmental conditions—fog, shadows, andso on

5. Interaction between objects

In the computer graphics field, where a sceneis literally created out of nothing, conditions 1through 4 have been studied thoroughly. You canreadily obtain the desired computer graphicsimages simply by entering data as matching val-ues for conditions in a computer. However, theimage compositing field has produced no clear-cutanswers—despite the ability to use many aspectsof computer vision—because the conditions aregenerally determined by the ambient physicalconditions at the time of shooting.

Turning to condition 5, we can think of manycases where interactions between objects occur.For example, two people filmed on different occa-sions appear to shake hands, rain falls and objectsget drenched, someone walks along leaving foot-prints behind him, and so on. Effects such as theseare quite difficult to achieve automatically, so inthe actual production of TV programs and moviestoday, special-effects professionals still do most ofthese jobs manually.

The virtual studio creates a realistic compositescene synchronized with foreground cameraworkby driving computer graphics using camera oper-ation data. In other words, although matchingcamerawork condition 1 achieves the main pur-pose in the virtual studio, matching conditions 2through 5 currently mostly relies on experienceand trial and error.

In this work, we at NHK propose a filming sys-tem that outputs the desired image by feeding theexternal data of filming conditions 1 through 5.We refer to this as a virtual camera. Both the com-puter graphics technique and image processingtechnique may contribute to a virtual camera. Fig-ure 2 shows a general schematic of an image com-positing system based on the virtual cameraconcept. In a conventional virtual studio, forexample, the virtual camera for the background isa real-time graphics workstation, and the virtualcamera for the foreground is a television camera.Output data from a sensor mounted on the tele-vision camera serves as the filming conditions.

37

Figure 1. Scene from the

TV program “Nano-

space.”

Outputscene

Multilayeredimage

compositing

Real TV camerawith control port

(Virtual camera #2)

Computer graphics(Virtual camera #2)

TV camera +image processor

(Virtual camera #3)

Filming conditions

camera work, lightingpickups, fog, shadow

object interaction, etc.

Figure 2. Image

compositing based on

the virtual camera

concept.

.

Virtual cameraTwo fundamental approaches realize the virtu-

al camera:

❚ The first approach generates images by usingcomputer graphics to set the filming parame-ters (model-based rendering).

❚ The second approach obtains images by pro-cessing the original images taken with a realcamera to match the filming parameters(image-based rendering).

The first approach imposes few constraints onthe filming parameters, but it’s difficult to replaceall the real objects by computer graphics whenrendering naturalistic landscapes. Furthermore,this approach requires immense computationalresources to produce images that closely simulatethe filming parameters. Turning to the secondapproach for realizing the virtual camera, let’s takecamerawork as an example. We first film a realscene using a wide angle and high resolution, andstore the image data in a computer. Then we getpanning and tilting effects by altering the size ofthe area.

Although the second approach can certainlyhandle photorealistic images, it imposes majorconstraints on the filming parameters. In com-puter graphics, the first approach is generallycalled model-based rendering, and the secondmethod is called image-based rendering. Model-based rendering has a long history, and fully

developed commercial systems are available. Bycontrast, image-based rendering has a much short-er history. It needs special techniques wheneverdifferent filming parameters are attempted forvideo materials already filmed. This is obviously aserious constraint, and the issue has been studiedintensively in recent years to find a solution.3,4

As mentioned, this article primarily concernscamerawork as filming parameters. Here, based onan image-based rendering approach, a methodcalled virtual shooting provides a different kind ofcamerawork from that with an actual camera. AtNHK we have developed a number of image com-positing systems that apply this virtual shootingmethod to a virtual studio.

Virtual shooting creates the impression of dif-ferent camerawork in an image already shot. Thevirtual shooting process described in this articleappears in Figure 3. Once a 3D object is filmed toproduce a 2D image, this image is mapped onto a2D surface and then refilmed using different camerawork. This process requires the followingdata:

1. Real camera data—the direction, position, andangle of view of the real camera when theobject was filmed

2. Real object data—the orientation and positionof the real object

3. Virtual object—the orientation, position, andscale of the surface on which the previously

38

IEEE

Mul

tiM

edia

zOi

Realcamera

Real object

x

y

World coordinate

z

Ui

Virtualobject

x

y

World coordinate

2D plate on whichreal object imageis mapped

zzx

Ri

x

y y

World coordinate

Object coordinated

Obtained image

Vi

Virtualcamera

z

Virtual object

x

y

World coordinate

(a) (b)

(d) (e)(c)

Figure 3. Virtual

shooting process.

.

filmed 2D image is mapped when placed in avirtual world

4. Virtual camera data—the direction, position,and angle of view of a virtual camera placed ina virtual world

Table 1 summarizes these four parameters.Transforming the input image to the desired out-put image is a perspective transformation. If allthe above parameters are known, then the objectimage on the output screen can be numericallycalculated. In other words, by applying virtualcamera data that differs from the original realcamera data, we can obtain a pseudo image of theobject just as if we had applied camerawork com-pletely different from that used when the objectwas actually filmed. For example, by placing thevirtual camera farther from the object than thereal camera was, we create the impression that theobject was shot from farther away. In this virtualshooting process, the image diminishes in size.

This type of principle has been used for variousaspects of image compositing work. However, sofar few have attempted to use a geometrically con-sistent image-processing approach that embracesall four parameters in Table 1. Most compositingwork, therefore, remains a manual endeavor inwhich the synthesized image is gradually createdwhile the technician watches and assesses theeffects.

This article provides an overview of imagecompositing systems capable of using all fourparameters listed in Table 1. We also examine thekey features of a system provisionally called a vir-tual camera system.

Virtual camera systemsWe went through several stages in developing

a virtual camera system. Here we look at the dif-ferent features and how the four parameters ofTable 1 were realized.

A desktop virtual camera system5

We developed version 1 of the virtual cameraas part of a larger research project to design a desk-top program production (DTPP) system.6 TheDTPP system’s principal purpose is to create anenvironment that supports managing an entireprogram production process from planning tofinal editing. The desktop virtual camera systemprovides the studio part of the entire DTPP process.

Let me explain the desktop virtual camera sys-tem briefly. Real-time computer graphics create

the studio set of a virtual studio. Actors are filmedfull size in front of a chromakey blue screen, andthe images are recorded on laser disk. Then theplayback images from the laser disk are imageprocessed using the virtual shooting method.Thus, the images from two virtual cameras aresynthesized in the system—one provides model-based computer graphics and the other providesan image-based video of the actors. The four para-meters in Table 1 are realized as follows:

❚ Real camera data—As a person walks around,the camera pans to maintain a full-size videoof the person. A rotary encoder mounted onthe real camera head measures this panningdata, and real camera data is recorded in a com-puter along with laser disk time code.

❚ Real object data—Here, the distance betweenthe person and the real camera stays fixed andrestricted to horizontal movement. This per-mits the person’s position to be calculatedusing the distance between the real camera andthe actor and the real camera data mentionedabove. These calculations are used as the realobject data.

❚ Virtual camera data—Virtual camera data isprovided by a user operating a “cameraworktool.”

❚ Virtual object data—The user can change thevirtual actor’s position (discussed in detailbelow) by using a mouse on a workstation.

Virtual shooting is achieved by using the abovedata and by image processing the image from thelaser disk.

39

January–M

arch 1998

Table 1. Parameters for virtual shooting

procedure.

Data Type ParametersReal camera Pan, tilt, rotation

x, y, z

Angle of view

Real object data x, y, z

Virtual camera data Pan, tilt, rotation

x, y, z

Angle of view

Virtual object data Pitch, yaw, roll

x, y, z

Scale

.

Figure 4 shows the system hardware configura-tion. The graphics workstation generates the setimage in real time. Meanwhile, the playbackimages of the actor—stored on laser disks—areprocessed by real-time image processors. In thissystem, a maximum of two actors can appear onthe computer graphics studio set. Finally, the com-puter graphics image and processed images of theactor are synthesized by multilayer compositing toproduce the final output image (see Figure 5). Aregular camera head for studio work with the TV

camera removed servesas the camerawork tool.

Figure 6 shows theoperating environ-ment. Using this sys-tem, a user can shootthe virtual studio spacein real time by operat-ing the camera head ona desktop work space.Figure 7 shows a graph-ical user interfacescreen on a worksta-tion. Using this graphi-cal user interface, theuser can select actors,move actors to differ-ent locations, adjustcamera settings, modi-fy computer graphics,

and perform a host of other actions simply bymanipulating the mouse.

A virtual camera manipulatorIn version 1 of the system, the user could pan,

tilt, and zoom the virtual camera as if operating areal camera. Although the user could move thecamera to the appropriate spot using the mouse as ifsetting up in a real studio, the version 1 systemcould not accommodate continuous tracking shots.The basic problem regarding 3D movement of the

40

IEEE

Mul

tiM

edia

Figure 5. Output images

of the desktop virtual

studio.

Figure 4. Schematic of

the desktop virtual

camera system.

Multilayeredimage synthesizer

Image processor(scroll, zoom)

Outputimage

Virtual studioset generator

(real-time computer graphics)

Two changeablecomputer graphics sets

modeled in advance

Laser disk #1

Camera data #1

Actor'simage #1

Image processor(scroll, zoom)Laser disk #2

Camera data #2

Actor'simage #2

Each laser diskincludes six actsshot in advance

Top viewof the floor

Positions canbe changedwith mouse

To setgenerator

Change of computer graphics

sets

Operation of virtual camera(panning, tilting, zooming)

Actor#2

Actor#1

Camera

Man-machine interfaceVideoData

.

camera derives from the fact that theimages of the actor are simple 2D sur-faces. Consequently, when the virtu-al camera moves in 3D space, itcannot obtain proper images.

Taking an extreme example, con-sider what happens if the virtual cam-era wraps around the actor from theside to the back. The actor’s image isjust a flat planar surface—the actor’sside and back cannot be recreated ifthey weren’t filmed originally.

Also, since the operating tool forthe virtual camera is the camera headitself, the user cannot manipulateany camerawork data except for pan,tilt, and zoom. This particular prob-lem led us to develop a tool called avirtual camera manipulator with sixdegrees of freedom of movement,which we connected to the virtualstudio. This enables the user to per-form camerawork in real time withfull degrees of freedom, such as pan,tilt, roll, x, y, and z (see Figure 8).

One effective solution to the basicproblem of the camera’s 3D move-ment is to use a motion control cam-era for the virtual camera. We willdiscuss this solution in the next sec-tion. Here we will address treating theactor as a 2D surface, as in version 1of the system. Let’s next consider thevirtual camera system featuring thevirtual camera manipulator.

We made a manipulator that

41

January–M

arch 1998

Figure 6. Operating

environment for the

desktop virtual studio.

Figure 7. User interface

screen for the desktop

virtual studio.

Figure 8. Virtual

camera manipulator.

Zoom operationFocus operation

Camera roll

Camera tilt

Camera pan

Link #2 tilt

Link #1 tilt

Arm pan

Camera view

direction

.

enables the actual computer graphics cameraworkperformed by human hands. The manipulator hassix-degrees-of-freedom movement and permitsthe operator to pan, tilt, roll, zoom, and positionthe virtual camera at any x, y, z coordinate in realtime from the desktop.

Implementing such a manipulator can followone of two basic approaches:

❚ Build a scaled-down version of a six-degrees-of-freedom camerawork mechanism on the desk-top that the operator can manipulate by hand.

❚ Implement six-degrees-of-freedom cameraworkwith mechanisms such as those used to control avehicle, namely a brake and accelerator that cancontrol the manipulator’s speed and direction.

In the work described here we adopted the firstapproach employing the link structure illustratedin Figure 8. (We later combined these two meth-ods; see the next section.) This six-degrees-of-freedom robot arm enables the operator to man-ually pan, tilt, roll, and position the virtual cam-era at any x, y, z coordinate with one hand.Potentiometers mounted at each joint on the armdetect the angle of rotation. The virtual cameramanipulator uses the angle of rotation data ateach joint to calculate and output six-degrees-of-freedom camera data in real time.

By integrating the camera manipulator withthe desktop virtual camera system, we developedan advanced system enabling the user to performsix-degrees-of-freedom camerawork from the desk-top to create composite worlds synthesizing nat-ural video and computer graphics. Figure 9 showsan overview of the operating environment, andFigure 10 is an example filmed using the system.

Our experience with actually operating the sys-tem demonstrated its practicality and ease of usein filming, but nevertheless exposed a number ofproblems. First and foremost, in the compositingprocess people are represented as 2D surfaces.However, actually moving the camera positionup, down, right, and left using the manipulatorproduced virtually no sense of unnaturalness inthe composite if the range of movement was nottoo great. In particular, when following an actoras he walked toward the left or toward the right,the system produced a natural-looking composite.If the user comes to understand and appreciatehow best to use the system (for example, by usingwraparound camerawork sparingly), the systemproves very practical. The system is also extreme-ly effective for depth-direction tracking shotsbecause almost no sense of unnaturalness is asso-ciated with movement in this direction.

Next, let us consider some of the problemsassociated with the camera manipulator:

❚ When moving the camera in a particular direc-tion using the articulated robot arm, theamount of resistance differs depending on thedirection of movement, so a smooth transitionis not always possible.

❚ Professional camera people require the camera

42

Figure 10. Output

images of the virtual

studio with

manipulator.

Figure 9. Overview of

the operating

environment for the

virtual studio with

manipulator.

.

head to have an appropriate “friction” and a“sticky” feeling for smooth camerawork. How-ever, this is difficult to realize with the manip-ulator because the mechanism would be toobig and complicated.

Several commercial versions of this type ofmanipulator are now available in the field of vir-tual reality, but to our knowledge they all havethese same problems.

A virtual camera system using a motion controlcamera7

This section describes the final version of thesystem in terms of supporting maximum degreesof freedom of motion of camerawork. Let mehighlight the new virtual camera system’s threemost significant features:

1. The system can film an actor from any direc-tion—from 360 degrees around the actor—while the camera pans, tilts, rolls, and movesto an x-y-z position.

2. The system can handle continuous camera-work over a wide range from very close up tosuper long distance shots in real time. Exploit-ing this feature, the virtual camera can moveto a position far beyond the physical limita-tions of studio space.

3. By operating the new manipulator, the usercan perform camerawork in real time.

Let me briefly describe how these features wereimplemented:

1. To implement the first feature, we construct-ed a motion-control camera (MC camera)capable of filming an actor with six-degrees-of-freedom movement (pan, tilt, roll, and x-y-z

position) in real time. Placing the actor on aturntable lets the system easily film the actorfrom any direction.

2. Figure 11 illustrates the principle behind howthe second feature was implemented. Whenthe virtual camera lies within the approxi-mately three-meter area where the MC cameracan move physically, the positions of the vir-tual camera and the real camera coincide, andno video processing is performed (Figure 11a).However, when the virtual camera moves out-side this area, the image processor performsgeometrical transformation (that is, virtualshooting) to the real camera image to obtainan equivalent effect (Figure 11b).

3. Finally, to implement the third feature, weconstructed a new type of manipulator thatsupports a full range a camerawork while thecameraperson observes the composite video.

Figure 12 shows a schematic of the system.When the distance between the camera and

43

January–M

arch 1998

Real camera and virtualcamera are at the sameposition Real camera

(a) (b)

Virtual camera

Real-timevideo

processor

3m3m

Figure 11. Principle of the motion control camera-

based virtual camera system. (a) Distance between

a camera and an object is less than 3m. (b) When

the camera moves out more than 3m.

Graphicworkstation(SGI Onyx)

Real-timevideo

processor

Chromakey(Ultimate)

Outputvideo

Virtualcameramanipulator

Turntable

Real camera data

Foreground

Background

Virtual camera data

Motion control camera

VideoData

Figure 12. Schematic of

the motion-control

camera-based virtual

camera system.

.

object is less than 3 meters (Figure 11a), the sys-tem controls the MC camera according to the vir-tual camera data. When the virtual camera movesbeyond 3 meters (Figure 11b), the system movesthe MC camera to a position where a straight linelinking the virtual camera and the actor intersectswith a 3-meter half sphere surrounding the actor.With this arrangement, the virtual camera doesnot move at a right angle to the line connectingthe actor with the real camera. Consequently,occlusion does not occur because of the wrap-

around effect discussedearlier. To be more spe-cific, occlusion occursin the direction ofdepth. The thicker theobject photographed inthe depth direction,the more severe thisbecomes. We calculat-ed7 that if the thicknessof the object is 50 cen-timeters and the dis-tance between theobject and the realcamera exceeds about3.5 meters, the averageocclusion becomes lessthan 1 pixel. In the pre-sent system, since theMC camera operateswithin a range of

around 3 meters, occlusion in the depth directiondoes not become a serious problem.

Regarding the four sets of parameters requiredfor the virtual shooting process listed in Table 1,the real camera and real object data are derivedfrom actual spatial measurements, while the vir-tual camera and virtual object data come fromoutside the system and are provided by the user.These parameters are entered into a speciallydesigned real-time image processor7 capable ofperforming perspective transformation based on

the virtual shootingmethod. To implementthe approach illustrat-ed in Figure 11, the sys-tem must be capable ofswitching between realshooting when the vir-tual camera is withinthe movable range andvirtual shooting whenthe virtual cameramoves outside themovable range. Adopt-ing the setup shown inFigure 12, just control-ling the MC cameramakes this switchingautomatic. In otherwords, when the virtu-al camera is in themovable range, theparameters of the real

44

IEEE

Mul

tiM

edia

Grip with distance zoom

Grip with zoom

LCD monitor

y

zx

Figure 13. Virtual

camera manipulator.

Figure 14. Operating

environment for the

virtual studio using the

motion control camera.

.

camera and the virtual camera become the same.The virtual shooting processing then becomes 1:1and is identical to the actual shooting.

In this system users need to perform continu-ous camerawork, from close-up shots to superlong shots. This led us to construct a new type ofmanipulator, quite different from the onedescribed earlier. Figure 13 shows what this newmanipulator system looks like. High-precisionrotary encoders are mounted on each axis. Datafrom these encoders is then used to calculate pan,tilt, roll, and the x-y-z location, and they outputcamera data at the video rate.

Another powerful capability incorporated inthis tool is distance zooming. With just a flick ofa rocker switch, the user can move in for close-upsor out for distance shots along the current line ofsight. This lets the user position the camera at anyx-y-z coordinate in the close space around the

actor using the manipulator, then use the distancezooming function for more distant camerawork.With controls somewhat like operating an air-plane, the system supports continuous camera-work covering the entire range from very close tolong distance shots.

Figure 14 shows the operating environment forthe system, and Figure 15 provides an example ofthe system’s video output. For this example, weused a mannequin as the actor on a 200 × 200-meter piece of computer graphics-created ground.Naturally, the system could cover a much moreextensive area.

When the system was used in real studio work,other practical problems arose. We had to con-struct a special infrared sensor around the per-former to avoid accidentally hitting him with thecamera. We also had to consider the rotation speedof the turntable in terms of safety and its inertia.

45

January–M

arch 1998

Figure 15. Output

images of the virtual

studio using the motion

control camera.

.

The shape of things to comeUp to this point we have surveyed a number of

systems for producing TV programs that are eitheravailable now or soon will be. This section high-lights the major issues likely to emerge in theyears ahead and suggests some possibilities thatmight open up beyond that.

Unified handling of virtual camera videomaterials

The virtual camera concept will undoubtedlybecome much more pervasive in the years ahead.All sorts of video components could be madeavailable for image compositing applications.From the standpoint of camerawork, it’s critical todevelop a robust software environment that canhandle these components in a simple, unifiedway. One approach would be to set up an icon-like format representing the camerawork attachedto an image component. Figure 16 shows this for-mat. The shaded boxes represent cameraworkattached to a video component. Black trianglesshow the ability to output or accept camerawork.The computer graphics box, for example, acceptscamerawork but has no output. The virtual shoot-ing module works with real and virtual camera-work data. The manipulator simply outputscamerawork. This example describes a normal vir-tual studio and a desktop virtual camera system inthe icon format.

Where the camerawork is unknown, it couldbe analyzed over a period of time using tech-niques developed in the area of computer vision.Reasonable estimates of camerawork could thenbe attached to the video components. With thissystem in effect, composite images could be cre-ated automatically with matching cameraworksimply by linking the proper camerawork iconsand assigning the image components to the com-

posite layer. To change the camerawork, the userwould employ a virtual shooting icon by provid-ing it with real and virtual camerawork data.

Certainly there are limitations to the camera-work available to produce composite scenes (for example, rough picture quality due to videoprocessing, or an unnatural image from wrappingaround the subject), but you could readily conceive a scheme for creating the final videothrough interactive correction using a graphicaluser interface. This would definitely facilitate the incorporation of the virtual camera conceptin commercial offline image compositing softwareapplications (such as Flame8). It would also be very significant in terms of supporting real-timeapplications.

Extending the virtual camera beyondcamerawork

As noted earlier regarding the examples in the“Image compositing” section, other conditionsbesides camerawork must be matched for anyoneto use the virtual camera approach. Certainlymuch more study must focus on technologies forprocessing original video materials to make themvirtual in other conditions, such as lighting andfilming conditions.

One example would be a technique for alteringthe direction of light. In the model-based world ofcomputer graphics (CG), light, filming conditions,and other parameters have been intensively studiedwith the aim of producing photorealistic CGimages. The image-based approach remains in theinitial research stage in addressing such issues, butone possible scheme would be to estimate the shapeof the object to be filmed, then alter the lightingconditions based on the shape data. The interactionbetween photography and computer graphicsimages will also continue to be a major issue.

46

IEEE

Mul

tiM

edia

Camera

Computergraphics

Composite Composite

Laser diskVirtual

shooting

Computergraphics Manipulator

Normal virtual studioDesktop virtual camera system

Figure 16. Video

components with

camerawork represented

by an icon-like format.

.

Conversion from image-based to model-basedimages

Although this is an obvious point, there is alimit as to how much can be accomplished by fur-ther processing video once filmed. Therefore, nat-urally the idea is starting to emerge of replacing aperson, for example, with a perfect computergraphics-generated replica of the person. In fact,the technique—called substituting a virtualactor—has already begun to see use. The wide-spread application of range sensors and motion-capture techniques in video production todayessentially equals this virtual actor approach.However, when a 3D computer graphics-generated image is substituted for an actual filmedobject today, the method involves producing onlythe video images needed, and only the requiredparts are rendered in detail. This method requiresmore thorough research.

Analyzing and using professional cameraworkDedicated manipulators use virtual cameras’

data, which human operators produce throughmanual methods. But there is a limit to howmuch camerawork human professionals can domanually. Using the motion-control camera sys-tem described earlier, for example, the range ofcamerawork is so extensive that it’s nearly impos-sible to finish the camerawork in one session inreal time. It’s therefore essential to adopt some ofthe same methods used in the available comput-er graphics tool for producing camerawork, par-ticularly the computer graphics approach toproducing interconnected camerawork by settingmultiple control points and connecting the pointswith a spline curve. However, let me also cautionthat the human touch is starting to disappearfrom much of the camerawork that you see in thecomputer graphics world today. It’s thereforeimperative that we continue to analyze the cam-erawork of human professionals and try to iden-tify the rules inherent in their work. Moreresearch should focus on exploiting the rules ofgood professional camerawork to develop a semi-automatic camerawork production system.

Embedding video in CG and applying CG-basedrendering

The various systems described in this article aresimilar to the extent that filmed video is first videoprocessed, then externally composited with com-puter graphics material. However, we could obtainthe same effect by mapping the filmed video ontoa flat surface defined in CG-3D space, then ren-

dering it together with the background scene. Thesystem could be implemented much more com-pactly using this latter approach, because wecould dispense with both the video processor andthe image compositor. What’s more, the resolu-tion between the video and computer graphicswould be almost exactly the same, and this hasnumerous advantages. For example, fog flasheffects could be implemented much more easily,and shadowing could be applied from within thecomputer graphics.

Integration of spatial image compositing andtemporal image editing, and automaticprogram production

As noted earlier, in addition to camera data, avirtual camera must also handle lighting condi-tions, filming conditions, and other parameters.However, broadening our interpretation, let’s con-sider what might be achieved by extending thisapproach to the semantic content of video mate-rials. Image compositing generally involves arrang-ing video materials in a spatial direction, whilevideo editing clearly involves arranging videomaterials temporally. You might say that produc-ing a program involves arranging video materialsalong both a spatial and a time dimension. Theissues addressed in this article focused narrowly oncamerawork in the spatial direction, but a numberof very interesting themes emerge if we broadenour interpretation. For instance, if you considervideo editing along the time axis, the significanceof lining up one particular clip next to another clipdepends not only on the continuity of semanticmeaning between the two clips, it also dependson, for example, the pattern of cameraworkbetween the clips and the continuity of soundbetween the clips. In other words, data inherent inthe material contribute to the editing process. Thissuggests that it might be possible to implement asemiautomatic compositing and editing methodby attaching attribute data to video materials andthen searching for links between the attributes.

This approach raises a host of interesting issuescalling for further research: What attributes wouldbe selected? How would the attributes be attachedto the video? How would the attributes actuallybe used? Eventually, this approach might evenenable us to describe the structure of TV pro-grams. Further, if we could identify the structuresof programs, this would introduce the possibilityof a largely automated process of program pro-duction by creating programs with analogousstructures but different contents.9

47

January–M

arch 1998

.

ConclusionBy matching the camerawork across various

video components and compositing the images,virtual camera systems can create realistic syn-thetic scenes that look as if they had actually beenfilmed on location. From a hardware point ofview, a virtual camera system is achieved by com-bining a real-time computer graphics techniqueand a real-time digital video processing technique.In addition to camerawork, other elementsinclude lighting, filming conditions, and thesemantic content of video. Matching or harmo-nizing these parameters across different videomaterials could open up a greatly expanded rolefor virtual camera systems in the future. By man-aging the various types of attribute data attachedto video materials, the full implementation ofsuch a method would facilitate image composit-ing and promote the development of a sophisti-cated system for editing and producing programs.

Regarding the quality of synthesized imagesthat integrate computer graphics and real images,many recent special effects films are really quiteincredible. However, movies are not produced inreal time, and a major portion of this work stillinvolves laborious frame-by-frame manual effort.A number of good software applications on themarket today (such as Flame) cater to these high-end requirements, vastly improving productionefficiency by applying semiautomatic functions tothis offline frame-by-frame manual work.

The three types of virtual camera systems pro-filed in this article all assume real-time perfor-mance capability. Broadcasting is one obviousarea where real-time performance would be a sig-nificant sales point. The end goal of virtual cam-era systems is to seamlessly integrate computergraphics and video to produce a single integratedvirtual world in real time. Numerous challengeshave yet to be overcome before this capability willbe freely accessible. We at NHK hope that thisfield will continue to prosper and expand in theyears ahead. MM

AcknowledgmentThanks go to the members of the Program Pro-

duction Technology Research Division and theRecording and Mechanical Engineering ResearchDivision of NHK research laboratories.

References1. T. Yamashita and M. Hayashi, “From Synthevision to

an Electronic Set,” Proc. Digimedia Conf., Digimedia

Secretariat, Geneva, Apr. 1995, pp. 51-57.

2. K. Haseba et al., “Real-Time Compositing System of

a Real Camera and a Computer Graphic Image,”

Proc. Int’l Broadcasting Convention, Amsterdam, Sept.

1994.

3. L. McMilland and G. Bishop, “Plenoptic Modeling:

An Image-Based Rendering System,” Computer

Graphics (Proc. Siggraph 95), Aug. 1995, pp. 39-46.

4. S.E. Chen, “QuickTime VR—An Image-Based

Approach to Virtual Environment Navigation,” Com-

puter Graphics (Proc. Siggraph 95), Aug. 1995, pp.

29-38.

5. M. Hayashi et al., “Desktop Virtual Studio System,”

IEEE Trans. on Broadcasting, Vol. 42, No. 4, Sept.

1996, pp. 278-284.

6. K. Enami, K. Fukui, and N. Yagi, “Program Produc-

tion in the Age of Multimedia—DTPP: Desktop Pro-

gram Production,” IEICE Trans. Inf. & Syst., Vol.

E79-D, No. 6, June 1996.

7. M. Hayashi, K. Fukui, and Y. Ito, “Image Composit-

ing System Capable of Long-Range Camera Move-

ment,” Proc. ACM Multimedia 96, ACM Press, New

York, Nov. 1996, pp. 153-161.

8. Flame, Discreet Logic,

http://www.future.com.au/flame.html.

9. M. Hayashi, “Automatic Production of TV Program

from Script—A Proposal of TVML,” S4-3, Proc. ITE

Annual Convention, July 1996, Institute of Image

Information and Television Engineers, Tokyo, pp.

589-592.

Masaki Hayashi works for NHK

Science and Technical Research

Laboratories in the Multimedia

Services Research Division. He has

engaged in research on image pro-

cessing, computer graphics, image

compositing systems, and virtual studios. He is current-

ly involved in research on automatic TV program gen-

eration from text-based scripts written using TVML (TV

Program Making Language) by integrating real-time

computer graphics, voice synthesizing, and multimedia

computing techniques (http://www.strl.nhk.or.jp/

TVML/index.html). He graduated from Tokyo Institute

of Technology in 1981 in electronics engineering, earn-

ing his Masters in 1983.

Readers may contact Hayashi at Multimedia Services

Research Division, NHK Science and Technical Research

Laboratories, 1-10-11, Kinuta, Setagaya-ku, Tokyo,

Japan, 157, e-mail [email protected].

48

IEEE

Mul

tiM

edia

.

virtual studios image compositing

Documents