Crowd Simulation || Crowd Rendering

Download Crowd Simulation || Crowd Rendering

Post on 09-Dec-2016




3 download

Embed Size (px)


<ul><li><p>Chapter 7Crowd Rendering</p><p>7.1 Introduction</p><p>In this chapter, we focus on the technical aspect of real-time crowd simulation, andhow an efficient architecture can be developed. But first, it is important to identifythe main goals to achieve. We need:</p><p> Quantity: real-time simulation of thousands of characters, Quality: state-of-the-art virtual humans, Efficiency: storage and real-time management of massive crowd data, Versatility: adaptation to diversified environments and situations.</p><p>The main problem when dealing with thousands of characters is the quantity ofinformation that needs to be processed for each one of them. Such a task is very de-manding, even for modern processors. Naive approaches, where virtual humans areprocessed one after another, in no specific order, provoke costly state switches forboth the CPU and GPU. For an efficient use of the available computing power, andto approach hardware peak performance, data flowing through the same path need tobe grouped. We thus present an architecture able to handle, early in its pipeline, thesorting of virtual human related data into grouped slots. We show results of thou-sands of simulated characters. As for the quality, we display state-of-the-art virtualhumans where the user attention is focused. Characters capable of facial and handanimation are simulated in the vicinity of the camera to improve believability, whilefarther, less expensive representations are used. Concerning efficiency of storageand data management, we mainly employ a database to store all the virtual human-related data. Finally, the presented architecture is versatile enough to be stressedin very different scenarios, e.g., in confined environments like an auditorium or aclassroom, and also in large-scale environments like a crowded fun fair or city.</p><p>In this chapter, we first introduce in Sect. 7.2 how the virtual humans can berepresented, along with their different levels of detail, depending on their distanceand eccentricity to the camera. Then, in Sect. 7.3, we fully detail our architecturepipeline, and how the virtual human related data are sorted in order to be processed</p><p>D. Thalmann, S.R. Musse, Crowd Simulation,DOI 10.1007/978-1-4471-4450-2_7, Springer-Verlag London 2013</p><p>195</p></li><li><p>196 7 Crowd Rendering</p><p>faster. In Sect. 7.4, we introduce the motion kit, a data structure specifically devel-oped for managing the different levels of detail at the animation stage. The stor-age and management of unchanging data with a dedicated database is developedin Sect. 7.5. In Sect. 7.6, we introduce a shadow mapping algorithm applied to acrowd. Finally, in Sect. 7.7, we introduced crowd patches.</p><p>7.2 Virtual Human RepresentationsIn an ideal world, graphic cards are able, at each frame, to render an infinite numberof triangles with an arbitrary complex shading on them. To visualize crowds of vir-tual humans, we would simply use thousands of very detailed meshes, e.g., capableof hand and facial animation. Unfortunately, in spite of the recent programmablegraphics hardware advances, we are still compelled to stick to a limited trianglebudget per frame. This budget is spent wisely to be able to display dense crowdswithout too much perceptible degradations. The concept of levels of detail (LOD),extensively treated in the literature (see [LRC*02]), is exploited to meet our real-time constraints. We specifically discuss levels of detail for virtual humans compos-ing a crowd: depending on the location of the camera, a character is rendered with aspecific representation, resulting from the compromise of rendering cost and quality.In this Section, we first introduce the data structure we use to create and simulatevirtual humans: the human template. Then, we describe the three levels of detail ahuman template uses: the deformable mesh, rigid mesh, and finally the impostor.</p><p>7.2.1 Human Template</p><p>A type of human such as a woman, man, or child is described as a human template,which consists of: A skeleton, composed of joints, representing articulations, A set of meshes, all representing the same virtual human, but with a decreasing</p><p>number of triangles, Several appearance sets, used to vary its appearance (see Chap. 3), A set of animation sequences which it can play.</p><p>Each rendered virtual human is derived from a human template, i.e., it is aninstance of a human template. In order for all the instances of the same humantemplate to look different, we use several appearance sets, which us allow to varythe texture applied to the instances, as well as modulate the colors of the texture.</p><p>7.2.2 Deformable MeshA deformable mesh is a representation of a human template composed of triangles.It is enveloping a skeleton of 78 joints, used for animation: when the skeleton moves,</p></li><li><p>7.2 Virtual Human Representations 197</p><p>the vertices of the mesh follow smoothly its joint movements, similarly to our skin.We call such an animation a skeletal animation. Each vertex of the mesh is influ-enced by one or a few joints. Thus, at every keyframe of an animation sequence,a vertex is deformed by the weighted transformation of the joints influencing it, asfollows:</p><p>v(t) =n</p><p>i=1XtiX</p><p>refi v</p><p>ref (7.1)</p><p>where v(t) is the deformed vertex at time t , Xti is the global transform of joint i attime t , Xrefi is the inverse global transform of the joint in the reference position,and vref is the vertex in its reference position. This technique, first described byMagnenat-Thalmann et al. [MTLT88], is known as skeletal subspace deformation,or skinning [Lan98].</p><p>The skinning can be efficiently performed by the GPU: the deformable meshsends the joint transformations of its skeleton to the GPU, which takes care of mov-ing each vertex according to its joint influences. However, it is important to takeinto account the limitations of todays graphic cards, which can store only up to 256atomic values, i.e., 256 vectors of 4 floating points. The joint transformations of askeleton can be sent to the GPU as 44 matrices, i.e., four atomic values. This way,the maximum number of joints a skeleton can have reaches</p><p>2564</p><p>= 64 (7.2)When wishing to perform hand and facial animation, 64 bones are not sufficient.</p><p>Our solution is to send each joint transformation to the GPU as a unit quaternion[Sho85] and a translation, i.e., two atomic values. This allows doubling the num-ber of joints possible to send. Note that one usually does not wish to use all theatomic structures of a GPU exclusively for the joints of a skeleton, since it usuallyis exploited to process other data.</p><p>Although rendering deformable meshes is costly, due to the expensive vertexskinning and joint transmission, it would be a great quality drop to do without them: They are the most flexible representation to animate, allowing even for facial and</p><p>hand animation (if using a sufficiently detailed skeleton). Such animation sequences, called skeletal animations, are cheap to store: for each</p><p>keyframe, only the transformation of deforming joints, i.e., those moved in theanimation, need to be kept. Thus, a tremendous quantity of those animations canbe exploited in the simulation, increasing crowd movement variety.</p><p> Procedural and composited animations are suited for this representation, e.g., idlemotions can be generated on-the-fly [EGMT06].</p><p> Blending is also possible for smooth transitions between different skeletal anima-tions.</p><p>Unfortunately, the cost of using deformable meshes as the sole representation ofvirtual humans in a crowd is too prohibitive. We therefore use them in a limitednumber and only at the fore-front of the camera. Note that before switching to rigid</p></li><li><p>198 7 Crowd Rendering</p><p>meshes, we use several deformable meshes, keeping the same animation algorithm,but with a decreasing number of triangles.</p><p>Skilled designers are required to model skinned and textured deformable meshes.But once finished, they are automatically used as the raw material to derive all sub-sequent representations: the rigid meshes and the impostors.</p><p>7.2.3 Rigid Mesh</p><p>A rigid mesh is a precomputed geometric posture of a deformable mesh, thus sharingthe very same appearance. A rigid animation sequence is always inspired from anoriginal skeletal animation, and from an external point of view, both look alike.However, the process to create them is different. To compute a keyframe of a rigidanimation, the corresponding keyframe for the skeletal animation is retrieved. Itprovides a skeleton posture (or joint transformations). Then, as a preprocessing step,each vertex is deformed on the CPU, in opposition to a skeletal animation, wherethe vertex deformation is achieved online, and on the GPU. Once the rigid meshis deformed, it is stored as a keyframe, in a table of vertices, normals (3D points),and texture coordinates (2D points). This process is repeated for each keyframe ofa rigid animation. At runtime, a rigid animation is simply played as the successionof several postures or keyframes. There are several advantages in using such a rigidmesh representation:</p><p> It is much faster to display, because the skeleton deformation and vertex skinningstages are already done and stored in keyframes.</p><p> The communication between the CPU and the GPU is kept to a minimum, sinceno joint transformation needs to be sent.</p><p> It looks exactly the same as the skeletal animation used to generate it.The gain in speed brought by this new representation is considerable. It is pos-</p><p>sible to display about 10 times more rigid meshes than deformable meshes. How-ever, the rigid meshes need to be displayed farther from the camera than deformablemeshes, because they allow for neither procedural animations nor blending, and nocomposited, facial, or hand animation is possible.</p><p>7.2.4 Impostor</p><p>Impostors are the less detailed representation, and extensively exploited in the do-main of crowd rendering [TLC02b, DHOO05]. An impostor represents a virtualhuman with only two textured triangles, forming a quad, which is enough to givethe wanted illusion at long range from the camera. Similarly to a rigid animation,an impostor animation is a succession of postures, or keyframes, inspired from anoriginal skeletal animation. The main difference with a rigid animation is that it is</p></li><li><p>7.3 Architecture Pipeline 199</p><p>only a 2D image of the posture that is kept for each keyframe, instead of the wholegeometry. Creating an impostor animation is complex and time consuming. Thus,its construction is achieved in a preprocessing step, and the result is then stored intoa database in a binary format (see Sect. 7.5), similarly to a rigid animation. We detailhere how each keyframe of an impostor animation is developed. The first step whengenerating such a keyframe for a human template is to create two textures, or atlas:</p><p> A normal map, storing in its texels the 3D normals as RGB components. This nor-mal map is necessary to apply the correct shading to the virtual humans renderedas impostors. Indeed, if the normals were not saved, a terrible shading wouldbe applied to the virtual human, since it is represented with only two triangles.Switching from a rigid mesh to an impostor would thus lead to awful poppingartifacts.</p><p> A UV map, storing in its texels the 2D texture coordinates as RG components.This information is also very important, because it allows one to apply correctlya texture to each texel of an impostor. Otherwise, we would need to generate anatlas for every texture of a human template.</p><p>Since impostors are only 2D quads, we need to store normals and texture coor-dinates from several points of view, so that, at runtime, when the camera moves, wecan display the correct keyframe from the correct camera view point. In summary,each texture described above holds a single mesh posture for several points of view.This is why we also call such textures atlas. We illustrate in Fig. 7.1 a 1024 1024atlas for a particular keyframe. The top of the atlas is used to store the UV map, andits bottom the normal map.</p><p>The main advantage of impostors is that they are very efficient, since only twotriangles per virtual human are displayed. Thus, they constitute the biggest part ofthe crowd. However, their rendering quality is poor, and thus they cannot be ex-ploited close to the camera. Moreover, the storage of an impostor animation is verycostly, due to the high number of textures that need to be saved.</p><p>We summarize in Table 7.1 and Fig. 7.2 the performance and animation storagefor each virtual human representation. Observe that each step down the represen-tation hierarchy allows one to increase by an order of magnitude the number ofdisplayable characters. Also note that the faster the display of a representation is,the bigger the animation storage. Moreover, rigid meshes and impostors are storedin GPU memory, which is usually much smaller than CPU memory. Figure 7.3 sum-marizes the shared resources inside a human template.</p><p>7.3 Architecture Pipeline</p><p>Modeling a varied crowd of virtual humans, individual by individual, mesh by mesh,and texture by texture, would be extremely time consuming and would also requirean army of dedicated skilled designers. Moreover, it would pose an evident storageproblem. We adopt a different strategy to model such a crowd, and concentrate our</p></li><li><p>200 7 Crowd Rendering</p><p>Fig. 7.1 A 10241024 atlas storing the UV map (above) and the normal map (below) of a virtualhuman performing a keyframe of an animation from several points of view</p><p>efforts on creating only a few human templates. From this reduced set, we theninstantiate thousands of different characters. We use different techniques to obtainvariety each time a character is instantiated from a human template. More details onthe specific variety methods can be found in Chap. 3.</p></li><li><p>7.3 Architecture Pipeline 201</p><p>Table 7.1 Storage space in [Mb] for 1 second of an animation clip of (a) deformable meshes,(b) rigid meshes, and (c) impostors</p><p>Max DisplayableNumber @30 Hz</p><p>AnimationFrequency [Hz]</p><p>Animation SequenceStorage [Mb/s]</p><p>MemoryLocation</p><p>(a) Deformable Meshes 200 25 0.03 CPU(b) Rigid Meshes 2000 20 0.3 GPU(c) Impostors 20000 10 15 GPU</p><p>Fig. 7.2 Storage space in[Mb] for 1 second of ananimation clip of (a) adeformable mesh, (b) a rigidmesh, and (c) an impostor</p><p>In this section, we describe the main stages of the architecture pipeline alongwith the data flowing through them. Figure 7.4 depicts the different stages: at eachframe, data are flowing sequentially through each one of them, beginning from thetop, down to the bottom. Simulating a crowd of virtual humans is an extremelydemanding task, even for modern processors. An architecture sustaining thousandsof realistic characters needs to be hardware-friendly. Indeed, simple approaches,i.e., treating virtual humans in no particular order, as they come, tend to producetoo many state switches for both the CPU and GPU. A more efficient use of theavailable computing power is recommended, in order to get closer to hardware peakperformance. Data flowing through the same path of the pipeline need to be grouped.As a consequence, at the beginning...</p></li></ul>