chapter 1 artistic rendering of portraits artistic rendering of portraits 3 to preserve the facial...

Chapter 1Artistic Rendering of Portraits

Mingtian Zhao and Song-Chun Zhu

zhao_Portraits

Abstract This chapter presents an overview of the latest computerized artistic por-trait rendering techniques. Artistic rendering of portraits can be viewed as an imagegenerating process controlled by two factors. The first one is face fidelity, meaninga rendered portrait image should preserve a certain amount of the original face’s in-formation. The second factor is the artistic style chosen to simulate different mediasuch as sketch and painting. In the literature, portrait rendering algorithms eitheradopt different models and data structures to represent the facial information or usedifferent graphical elements and compositional methods to simulate various mediaand styles. These two factors essentially reveal the two principles in portraiture,namely the pursuits of likeness and aesthetic.

1.1 IntroductionZZ:sec:intro

Portraiture, the artistic representations of the appearances and expressions of humanfaces, is one of the oldest and most popular genres in visual arts. Generally there aretwo essential factors to consider in creating a portrait.

• The first factor is face fidelity. A portrait should preserve a certain amount of theoriginal face’s information, ensuring that not only can it be recognized as a facepicture, but there is also an appropriate level of similarity in the perception of theappearance or character of the person in the portrait to the actual person or to theperson shown in a photograph.

Mingtian ZhaoUniversity of California, Los Angeles, California 90095-1554, USAe-mail: [email protected]

Song-Chun ZhuUniversity of California, Los Angeles, California 90095-1554, USAe-mail: [email protected]

1

[email protected]

[email protected]

2 M. Zhao and S.-C. Zhu

• The second factor is the artistic style of the portrait picture, chosen to simu-late different media such as sketch, painting, paper-cut, and caricature. Thesestyles/forms provide unique dictionaries of visual elements used to express thevarious facial structures and appearances.

These two factors vary with different ages and genres of portraiture, revealing itstwo principles, namely the pursuits of likeness and aesthetic. For example, beforethe invention of photography in the 19th century, the mainstream artists pursued ac-curate likeness by studying the structure of bones and muscles beneath the facialskin, practicing their skills on depicting them, developing pigments made from var-ious materials, and even using external tools such as mirrors and pinhole imagingto improve the fidelity. Nowadays, with the popularity of digital cameras, perfect fi-delity is easily available, but many modern portrait artists usually depict only roughor even distorted likeness. Instead, they resort to new styles and techniques to evokestrong psychological and emotional reactions in the audience, demonstrating thesense of aesthetic.

These two factors/principles also apply to computerized artistic rendering ofportraits—the simulation of portraiture on the computer. From an image analysisand synthesis perspective, using W to denote the facial information and ∆ to denotethe elements to compose an image, a natural image (photograph) IN is generatedwith

IN = f (WN;∆N), (1.1) {ZZ:eq:in}

and in a similar way, an artistic portrait IA can be synthesized with

IA = g(WA;∆A), (1.2) {ZZ:eq:ia}

where f and g are image generating functions (rendering processes). Interestingly,Eqs. (1.1) and (1.2) differ in all their three aspects:

• WN 6=WA. The facial information W usually contains features such as geometry(2D or 3D), appearance, texture, color, and illumination. To generate a realisticphotograph, WN should usually approximate the truth very closely. In contrast,WA often only captures part of the information interesting to artistic perception,which is regarded as the essence of a face by many artists.

• ∆N 6= ∆A. In the image analysis and computer vision literature, ∆N is usuallymodeled with PCA, wavelets like Gabor bases, image patches, etc. ∆A, however,is usually a dictionary of graphical elements used in creating artworks, for exam-ple, graphite sketches, paint brush strokes, etc.

• f 6= g. While f is usually a simple linear combination of the image elements,the portrait rendering process g can be a much more complex process involvingcontent-oriented algorithms for manipulating the sketches, strokes, etc.

In the non-photorealistic rendering (NPR) [7] literature, there are plenty of studieson computerized artistic portrait rendering with different implementations of WAand g( · ;∆A), corresponding to likeness and aesthetic, respectively.

1 Artistic Rendering of Portraits 3

• To preserve the facial fidelity, existing portrait rendering methods adopt differ-ent models and data structures to represent selected geometry and appearancefeatures in WA.

• To simulate different artistic styles, existing methods use different dictionariesof graphical elements, which are maintained in ∆A, and corresponding composi-tional algorithms, g.

In the rest of this chapter, we review the latest artistic portrait rendering methodsand their respective implementations of the two factors. We organize these methodsby the four most studied types of portraits in NPR: sketch, paper-cut, oil-painting,and caricature.

1.2 SketchZZ:sec:sketch

A sketch is a rapidly executed drawing demonstrating the basic shape and appear-ance features of objects. In this section, we review three types of portrait sketch-ing methods. The first two types of sketches depict the boundaries and salientedges/curves in portraits with concise strokes (like stick drawings). The former usesholistic models for the shape of face, and the latter uses part-based models withgreater expressive power. The third type of portrait sketch focuses more on the fa-cial surface, including the appearance caused by illumination and shading effects.

1.2.1 Holistic ModelsZZ:sec:holisticsketch

Li and Kobatake [10] made one of the earliest investigations in generating facialsketches from photographs. Their method consists of three steps:

1. Color coordinate transformation, in which an input image is first processed withthe saturation component enhanced, and transformed to the YIQ color space. Inthe YIQ color space, the Y channel represents the luma information, and the I andQ channels represent the chrominance information. They are used for extractingthe face area and some facial parts: lips are red so they have relatively large valuesin Q, the face area with skin color is generally larger in Q and smaller in I thanthe dark gray background, and black pupils of eyes are usually darker than otherparts reflected in Y.

2. Facial components detection. In addition to lips and pupils, facial parts such aseyes, mouth, nose and chin are located with rough edges detected using the Ychannel of the image.

3. Approximation of edges with feature points and feature curves. The method takesadvantage of a facial sketch representation with 35 feature points connected byfeature curves as shown in Fig. 1.1a, in which spliced second-order polynomials


The geometric template is shown in Fig. 4. It is used to detect facial parts from the two imagesA and B. The template is moved on the imageA. If a white region is detected, it is assumed to be the left eye and the template of the left eye is fixed and the template of the right eye is moved in a given distance along the line which passes through the region of the left eye and is parallel to the longer side of Rectangle-filter. If the region corresponding the right eye is detected, templates of shadow of nose, mouth and edge of lower chin are moved along the line perpendicular to the line connecting the regions of the left and the right eyes. If the mouth is detected, templates of contours of face is applied to the imageB. If the detection fails, the Rectangle-filter and geometric template are tilted 30 degrees and the above detection processing is repeated again, which is necessary to cope with tilted faces. The detected positions of the eyes and the mouth for Fig. 2(a) are shown in Fig. 2(f).

After detecting the positions of eyes and mouth, the tilted face image is adjusted to be upright. The positions of the other facial parts are guessed and the extraction of sketches is done based on these positions.

3. The extraction of sketches of facial parts Facial sketch image is similar to facial caricature and the

difference of these is that sketch images represent positions and edges of facial parts accurately. Thirty eight feature points and eighteen feature curves are shown in Fig. 5. Positions of all of facial parts are determined by feature points and edges are represented by feature curves. Morphology plays an important role in the extraction of edges of facial parts. Least- squares method is applied to them to determine feature points of face. Facial sketch image is represented by feature curves connecting feature points. Because of the limited space, only the algorithms of extraction of the sketches of mouth and chin are introduced in the following.

3.1 The extraction of sketch of mouth To extract the region of mouth, Closing processing using a

Fig. 5 Feature points and feature curves of facial sketch image.

large disk is applied to the original image. By subtracting the original image from the obtained image and binarizing, we can obtain the region of the mouth. A sketch of a mouth is constituted by four parts of lower and upper sketches of mouth divided at the horizontal position of the center of face. The position of the center is obtained by formulation(1).

x16 X X ~ - . ~ ? ~ x n , x, + x 5 - .r?, - xl (1)

x =

where .T!, i = 1 , 5, 16, 21 are horizontal positions of feature points shown in Fig. 5. Lower and upper edges of mouth are divided at this position and each segment on the right or left side is approximated by analytic function using least-squares method. If the result of least-squares fitting is not satisfacto- rily good, the default sketch of mouth is used. An experimental result is shown in Fig. 6.

3.2 The extraction of sketch of chin Chin contour is rather difficult to extract due to illumina-

tion condition and rotation of head, and it can be highly con- taminated by noise[6]. Therefore, few studies to extract edges of chin are reported. However, by investigating a lot of facial photos, we found that three parts of edges, i. e. , two vertical edges near ears and one horizontal edge of lower chin are able to be extracted. Edges of chin are extracted by erosion processing. Horizontal Rectangle-filter is used to extract the lower horizontal edge and vertical Rectangle-filter to extract the vertical edges. Two points of sideburns detected in the extraction of outline of hair are also used. Experimental results are shown in Fig. 7.

Fig. 6 (a) original image, (b) extracted binarized image. ( c ) results of approximation. (d) the extracted sketch of mouth.

3 18

(a)

4. Experimental results The photos of faces with various poses used in this research

are taken in our laboratory with different illumination condi- tions. Two facial imagcs and extracted sketch images are shown in Fig. 8. The total number of test images is 300 of 100 people. The rate of detecting the center points of eyes correctly is 98%. This confirms the robustness of the detecting process us ing the generalized symmetry operator, Rectangle-filter and the geometric template. The rate of extracting the sketch of each facial part correctly is 93%-98%. They are summarized in Table1 . The rate of extracting facial sketch from facial image correctly is 91% which is very high in the researches of extracting facial features.

facial art B eye @

5. Conclusion We proposed a new method to extract sketch image ro-

eye ow B @ se @ Q Bmoutl Fig. 8 top: origmal images, bottom: extracted facial sketch images. rate

bustly. It is based on the symmetry measure, Rectangle-filter and a geometric template. Edges of each facial part are represented by analytic function. Experimental results show the robustness of the proposed method. Facial sketch image represents positions and edges of all facial parts. Therefore, we think that facial sketch image may play important roles in the applications of face recognition and expression transformation. The future theme is the experiments on extracting sketch images from faces of different ages and races using our system.

98% 96% 96!

REFERENCES [ 11 Kann Sobottka, Ioannis Pitas, "Extraction of Facial Regions and Features Using Color and Shape Information", 13th IAPR, pp, 421- 425, 1996. [2] Yuanzhong Li, Hidefumi Kobatake, "Extraction of facial skctch image and expression transformation based on FACS", ICIP95, pp. 520-523, 1995. 131 Brunelli R. and Poggio T., "Face recognition: features versus templates", IEEE Trans. Pattem Analysis and Machine Intelligence,

[4] Shi-Hong Jeng, Hong-Yuan Mark Liao, "An Efficient Approach for Facial Feature Detection Using Geometrical Face Model", 13th

[5] D. Reisfeld and Y. Yeshuran, "Robust detection of facial features by generalized symmetry", Proc. 11 th Int. Conf. on Patt. Recog., pp. 117-120, 1992. [6] Xiaoguan Jia and Mark S. Nixon, "Extending the Feature Vector for Automatic Face Recognition", IEEE Trans. Pattem Analysis and Machine Intelligence, Vol. 17, NO. 12, pp. 1167-1 176, 1995.

Vol. 15, NO. 10, pp. 1042- 1052, 1993.

IAPR, pp. 426-430,1996.

Fig. 7 Columnl: top: original image, bottom: image after preprocessing of chin Column2: top: result of horizontal Rectangle-filter, bottom: result of vertical Rectangle-filter. Column3: top: result of thinning of Row2-bottom. bottom: extracted sketch image.

319

(b)

Fig. 1.1 (a) Feature points and feature curves used for extracting facial sketch images by Li andKobatake [10, 11]. (b) Example results of facial sketch extraction using their method, from [11].c©IEEE. ZZ:fig:LiKobatake

are used for approximating the edges of mouth, eyes, nose and chin. The 35feature points are then detected as characteristic points on the curves.

In their follow-up work [11], detailed algorithms of the method were improvedby adding a symmetry measure, a novel rectangle filter, a geometric template, andmorphological processing, all of which led to more robust detections of the positionsand edges of facial parts. Figure 1.1b shows two example facial sketches extractedusing the method.

In Li and Tobatake’s methods, the two factors introduced in Section 1.1 are im-plemented in a very straight-forward way. The facial information WA is representedwith the shape model shown in Fig. 1.1a, which is extracted from the color and gra-dient features of the input image. This ensures that the portrait sketch looks similarto the photograph. As for the second factor, ∆A simply defines a line drawing style,with the feature curves fitted using spliced second-order polynomials.

To further improve the results of Li and Tobatake’s method, especially on theaesthetic aspect, many studies have been carried out recently. Chen et al. [3] devel-oped an example-based facial sketch generation system whose pipeline is shown inFig. 1.2. Observing that the artistic styles of sketches vary among different artistsand cannot be easily summarized by precise rules (such as polynomial curves up toa specific order or curvatures of certain degrees), their system refers to a set of train-ing examples for obtaining their styles. Each training example is a pair of a portraitphotograph and its corresponding sketch image created by artists. An active shapemodel (ASM) [6] for the face is also attached to each example, with the landmarkpoints manually labeled for better accuracy.

• In the training phase, a mean shape of face is computed by averaging the ASMlandmarks of all training examples. Then a geometric transformation is per-formed on each training example in order to warp the image and the sketch tomatch the mean shape. After that, a prior probability model of the sketches is


Mean Shape(Landmarks)

Geometric Transform

Labeled Data(Landmarks)

Training Images {Ii}Training Sketches {Si}

Transformed Images {I′i}Transformed Sketches {S′i}

ASM ModelNon-Parametric

SamplingPrior Probability

Model P(S′)

SourceImage

I

ASM SearchResult

TransformedImage I′

ExpectedSketch

Image T ′Transformed

Sketch S′Final

SketchS

TrainingRunning

1Fig. 1.2 Pipeline of the example-based facial sketch generation system developed by Chen et al. [3] ZZ:fig:Chen01pipeline

learned to cover three types of curves: those that always appear, those that prob-ably appear but are independent of other curves, and those that depend on othercurves.

• At runtime, given an input face photograph, the ASM is first applied to extractthe landmark points. Then a geometric transformation is defined between theselandmarks and the mean shape. After applying this geometric transformation towarp the input image, non-parametric sampling is used for producing a sketchimage for the input, which is then warped back to the original shape using aninverse geometric transformation, producing the final sketch.

Compared to Li and Kobatake’s method, Chen et al.’s system improved both WAand ∆A. For facial information, WA, the ASM model provides a more robust wayto capture the shape of the face than the model in Fig. 1.1a which relies on localedge detection and curve fitting. For a higher level of aesthetic, the sketches used asgraphical elements in the style set ∆A essentially come from the training examplescreated by artists, instead of naive polynomial curves, and are encoded within a priordistribution of the sketch curves as we introduced above.

1.2.2 Part-Based ModelsZZ:sec:partbasedsketch

A disadvantage of the holistic methods is that the rendered sketches only containstiff lines and curves, whereas sketches created by artists usually have various curvestyles and multiple levels of darkness and thickness for different facial components,as well as the hair. To address this problem, part-based methods were introducedfor processing different parts of the face separately in order for greater expressivepower. Generally, these part-based models allow richer representations of WA andlarger variety of elements in ∆A.


1.2.2.1 Flat ModelZZ:sec:flatsketch

Chen et al. [2] proposed an example-based composite sketching approach. This ap-proach decomposes the face into semantic parts (as shown in Fig. 1.3) and usesimage-based, instead of curve-based, sketches (as shown in Fig. 1.4). Their systemalso includes a sub-system for hair rendering which contributes greatly to the visualquality of the portrait sketches.

Example-Based Composite Sketching of Human PortraitsHong Chen1,2, Ziqiang Liu1,2, Chuck Rose3, Yingqing Xu1, Heung-Yeung Shum1, David Salesin4,5

1 Microsoft Research, Asia2 University of California , Los Angeles

3 Microsoft Corporation4 University of Washington

5 Microsoft Research

�D� �H��E� �F� �G�

Figure 1: (a) Input image; (b) Image decomposed into components; (c) Best match for each component found from training examples; (d)Corresponding drawings of components in (c); (e) Composite drawing of separate parts as the final drawing.

AbstractCreating a portrait in the style of a particular artistic tradition or aparticular artist is a difficult problem. Elusive to codify algorith-mically, the nebulous qualities which combine to form artwork areoften well captured using example-based approaches. These meth-ods place the artist in the process, often during system training, inthe hope that their talents may be tapped.Example based methods do not make this problem easy, how-

ever. Examples are precious, so training sets are small, reducingthe number of techniques which may be employed. We propose asystem which combines two separate but similar subsystems, onefor the face and another for the hair, each of which employs aglobal and a local model. Facial exaggeration to achieve the desiredstylistic look is handled during the global face phase. Each subsys-tem uses a divide-and-conquer approach, but while the face subsys-tem decomposes into separable subproblems for the eyes, mouth,nose, etc., the hair needs to be subdivided in a relatively arbitraryway, making the hair subproblem decomposition an important stepwhich must be handled carefully with a structured model and a de-tailed model.

Keywords: Non Photorealistic Rendering, Computer Vision

1 IntroductionWe describe in this paper an interactive computer system for gen-erating human portrait sketches. Our system takes a human faceimage as input and outputs a sketch that exhibits the drawing styleof a set of training examples provided by an artist. Our artist createdthe training set in the style of Japanese cartooning, or “manga”. Ourtraining data has two prominent characteristics. First, each exam-ple sketch is a highly abstract representation of the original sourceimage, using realistic as well as exaggerated features to achievean evocative likeness. Second, the training set contains a limitednumber of examples, as is often the case in example-based art ap-plications. From this limited set, we can construct sketches for any

image that satisfies certain input requirements.Our system tackles this problem with a learning based rendering

approach. Although there have been several successful similar ef-forts, discovering the relation between the source portrait and thecorresponding sketch is a problem worth continued study. The Im-age Analogy [11] technique synthesizes a new “analogous” imageB! that relates to an input image B in “the same way” as the exam-ple image A! relates to A. This technique, while good at mimickingthe local relationships from image pair (A!,A) to (B!,B), lacks thepower to capture the high level structural information present inour data. Another system, Example Based Sketch Generation [5],assumed a Markov Random Field (MRF) property in order to usenon-parametric sampling of sketch point. It does not address faceexaggeration and hair rendering which is handled in our system.Where our work improves upon the existing body in this area isthis: in addition to making use of local information, we use theinherent structure in the data for a global synthesis step.We propose a composite sketching approach for our system. The

basic idea is to first decompose the data into components that arestructurally related to each other, such as the eyes or mouth. Af-ter these have been independently processed, these components arecarefully recomposed to obtain the final result. These two steps forboth face and hair form the core of our system. Generating evoca-tive sketches of hair is one of our primary results. The principaladvantage of our component-based approach is its capacity to cap-ture large-scale correlation within the components and its ability tocreate an overall picture in the style intended by the artist. This canbe seen in Figure 1.

2 Related workNPR and digital arts. Many non-photorealistic rendering (NPR)techniques have been proposed to generate digital artwork. Sys-tems have been created to emulate watercolor and impressionism.More relevant to our work, however, are the NPR results of pen-and-ink [8; 19; 20; 22; 23] and technical illustration [17; 18]. NPR

Fig. 1.3 Portrait sketching using example-based composite sketching [2]. The input image (a) isdecomposed into facial parts shown in (b), then for each part the best match is found in the trainingexamples, as shown in (c). Then sketches are drawn for each part in (d), which are composed in (e)by considering both global and local features. c©ACM. ZZ:fig:Chen04teaser

(a) (b) (c)

Figure 3: The effect of the local and global model. (a) The inputimage; (b) Sketch created with local model; (c) Sketch after globalmodel incorporated.

(1)

(2)

(3)

(4)

Figure 4: The prototypes extracted from the training set.

For the representation of Ig, we carefully chose 14 features froma pool of approximately 30 recommended facial features in a cari-cature drawing textbook [16]. They are

w1/w w2/w w3/w w4/w w5/w w6/w w7/wh1/h h2/h h3/h e1 e2 e3 e4

. (3)

These relations describe the proportion of the face devoted to a par-ticular facial feature. w4/w, for instance, relates the width of thehead to the width of the mouth. By not tying these relations to fixedvalues, the model can adjust the size of the features as the overallsize of the head is changed. For any input face image I, we firstuse an Active Appearance Model (AAM) [6] to determine the 87control points. We then use these control points to generate Ig. Todetermine the placement of these face elements on the cartoon can-vas, each element needs five parameters {(tx, ty),(sx,sy),θ}. (tx, ty)represents the translation of the element in the x and y directionsrespectively. (sx,sy) are the scaling parameters and θ is the rela-tive rotation angle. Additionally, the face contour needs the warpparameter cw. Together, these constitute I!g. As shown in Figure 3,while each of the facial features drawn using the local model arecorrect, their overall composition is lacking. The global model im-proves the sketch, making a more vivid overall face.Learning the relation between I!g and Ig from a set of examples

is non-trivial. Instead of simple linear mapping, we use k-NN in-terpolation to reproduce this non-linear mapping. We also makeuse of heuristic methods adopted by the artist. A previous systemby Liang et al. [14] used partial least squares to learn facial fea-tures automatically. Since the number of examples is usually verylimited and hand-annotating each example is a relatively minor ad-ditional cost over their initial creation, we believe our method ismore appropriate and robust.

5 Composing hairHair cannot be handled in the same way as the face. This is due toa number of reasons. First, hair has many styles and is not struc-tured in the same regular way that faces are, so building a modelis not straightforward. The hair is in many ways a single unit, of-ten rendered using long strokes, making meaningful decomposition

w

w1

w2

w1

hw3

w4w5w6w7

h1

h2

h3h4

h4h4

e4e3

e1 e2

Figure 5: 14 features defined for a global model.

challenging. Even if we decompose the hair into regions, recom-position remains a difficult step. Finally, there is no clear corre-spondence between regions of two different hairstyles. Lacking acorrespondence, we cannot use blending techniques.Due to these factors, we synthesize the hair independently from

the face, employing a different mechanism. The overall flow of ourhair system is shown in Figure 7. First the image is dissected into 5regions we call the structural components. Our subject’s structuralcomponents are matched against the database and the best matchis selected for each. The n-best matches can be chosen to createa range of pictures for the user. These matched components arewarped and assembled into an overall model for the hair. To this,details are added based upon the subject image. This process isdetailed in the remainder of the section.

5.1 Hair composite modelTwo key aspects make it possible to render the hair. Critically, theglobal hair structure or impression is more important than the de-tails, especially for a highly-stylized look like manga. Attention tobasic detail is not necessary since a person’s hair details are rarelystatic. Wind, rain, rushed morning preparations, or severe sleepdeprivation brought on by Siggraph deadlines, can all affect the de-tails of one’s hair. Figure 6 shows the three best results for a singlesubject’s hair. All exhibit the correct shape, and picking betweenthem can be at the behest of the user or chosen automatically by thesystem.As suggested by our artist, we coarsely segment the hair into five

segments, as shown in Figure 7. Each is in a fixed position and maydivide long strokes that are fixed later on. We chose these segmentsbecause each indicates important global information about the hair,so we name these our “structural components”. In addition to thestructural (global) model, we also use a detail model. Artists oftenuse detail to add uniqueness and expression to a portrait. Thesedetails confuse the global model, but we take them into accountwith a detail model, the final phase in our hair sketch synthesis.

5.2 Extracting the image features for the hairWhen an input image is presented to the system we first performan image processing step. We use this step to determine the imagefeatures of the hair which we can then match against the database.The two techniques we employ are an estimated alpha mask andhair strand orientation fields, as shown in Figure 8.First, an alpha mask is calculated to separate the hair region from

background and face. A pixel used for hair can often be blendedwith those used for the background and face, which is why an alphamask is needed, rather than a simple bit mask. Knockout [1] is usedto generate the mask, as shown in Figure 8(a).Hair orientation is a very useful feature for determining the

placement of strokes. We begin by using steerable filters to cal-culate the orientation of each pixel in the hair as demarcated by the

Fig. 1.4 Example sketches of facial parts used by Chen et al. [2] for ∆A. c©ACM. ZZ:fig:Chen04parts

In the composite sketching system of Chen et al., the sketches are split into twolayers: a global layer and a local layer. The global layer captures the spatial place-ment of the facial parts, and the local layer captures how each facial part is depicted.For both layers, features are selected and learned using training image-sketch pairsprovided by artists. At runtime, two processing steps are executed:

• Local processing. The input image is first decomposed into facial parts usinga refined ASM. Then, for each part, the system finds the best match in shape(according to ASM landmarks) from the training examples. After that, the corre-sponding sketches of the best matches are adopted.

• Global processing. The sketches of the facial parts are composed according to thelearned global spatial model, which adjusts the locations, sizes and orientationsof the parts.


The hair rendering sub-system extracts both structural (boundary) and detail(streamline) components in the hair area and fits them with curves. The two types ofcurves are then rendered using their respective example-based strokes (learned fromtraining examples) to synthesize the hair sketch.

Regarding the two factors we have been discussing in this chapter, the global/localhybrid method of Chen et al. works in a more flexible way for preserving face fi-delity in WA than the global geometric transformation based on ASM used in theirearly work [3]. For each facial part, the locally best matching sketch is selectedand composed into a globally coherent image. But, due to the selection of the bestmatches instead of a learned local sketch model [3], the system may lose certaindegrees of likeness when the number of training examples is small, for example, thelocally best matches may not be good enough in terms of similarity. On the aestheticaspect, however, this method achieves much finer detailed appearances by maintain-ing image-based sketches instead of stiff curves in ∆A (as shown in Fig. 1.4) wheredifferent facial parts may be sketched using different techniques by artists.

1.2.2.2 Hierarchical ModelZZ:sec:hierarchicalsketch

Since part-based portrait rendering methods have greater expressive power thanglobal methods, more studies in this direction have been dedicated to better modelsand algorithms for both global and local processing. A powerful model for orga-nizing the facial components is the hierarchical and compositional And-Or Graph(AOG) [20, 22, 21], as shown in Fig. 1.5.

In this AOG, the And-nodes represent decompositions of the face or its parts intosub-components (e.g., decomposing the face into nose, mouth, etc.), and the Or-nodes represent alternative types of a part (e.g., there are multiple ways to sketcha nose, given by either models or examples). On the AOG, complex spatial con-straints of the facial parts can be embedded at the And-nodes at multiple levels, forexample, by using Markov networks [22]. Also, the photo-sketch similarity measureenforcing the likeness principle can be applied at multiple resolutions (thanks to thehierarchical structure) and optimized by switching at the Or-nodes during rendering.

To create the face AOG, a hierarchical structure is designed by hand with thenodes corresponding to semantic facial parts. Then a set of training sketch exampleswith their corresponding photographs are collected and manually decomposed cor-responding to the structure of the AOG. The decomposed parts are then associatedwith the nodes in the AOG in order to construct ∆N and ∆A for multiple scales in thehierarchy as marked on the right side of Fig. 1.5. Using these ∆N and ∆A, modelsfor constraining spatial configurations are learned at the And-nodes, and switchingprobabilities are summarized at Or-nodes. Note that if we save all examples in theAOG without merging similar ones, the switching probabilities are uniform.

Generating a portrait sketch from an input photograph is equivalent to instanti-ating a parse graph from the AOG by decomposing at And-nodes and switching atOr-nodes, as shown in Fig. 1.6. An instance optimizing the probabilities at the And-and Or-nodes, as well as the similarity measures at multiple resolutions, is expected


changes. For example, a face expands to facial parts duringtransition from low to medium resolution, while a facial partexpands to image patches during transition from medium tohigh resolution. On the other hand, the state transitions offacial parts can also cause structural changes like opening orclosing eyes, which are widely observed in facial motions. Toaccount for these structural variations, we formulate ourrepresentation as a three-layer AND-OR graph shown inFig. 2. An AND-node represents a decomposition with theconstituents as a set of OR-nodes, on which the constraints ofnode attributes and spatial relations are defined, as in aMarkov random field model. An OR-node functions as a switchvariable in the decision trees, pointing to alternative compositedeformable templates that are AND-nodes. The selection/transition is then realized by applying a set of stochastic

grammars and assigning values to the switch variables. A leafnode is an instantiation of the corresponding AND-node,which is associated with an active appearance model (AAM) toallow geometric and photometric variations.

In our model, parsing a face image is equivalent tofinding a valid traversal from the root node of the AND-ORgraph. Following the thick arrows to select appropriatetemplates in Fig. 2, we parse the input face image and arrivein a configuration, as in Fig. 3. In essence, an AND-OR graphis essentially a set of multiscale faces of all structural,geometric, and photometric variations. We construct theAND-OR graph by maximizing the likelihood of parametersgiven a set of annotated face parse graphs. The parsing of anew face image is then conducted in a coarse to fine fashionusing maximum a posteriori (MAP) formulation. To balancethe representation power and model complexity, we adoptminimum description length (MDL) as the criterion oftransitions between layers. These transitions are based onboth the scales/resolutions of input face images and theaccuracy required by specific tasks, e.g., low resolution fordetection, medium resolution for recognition, and highresolution for nonphotorealistic rendering.

1.3 Related WorkIn computer vision, numerous methods had been proposed tomodel human faces. Zhao et al. suggested [41] that followingthe psychology study of how human use holistic and localfeatures, existing methods can be categorized as 1) global [7],[8], [13], [16], [31], [2], 2) feature-based (structural) [10], [17],[32], [33], [34], [40], and 3) hybrid [14], [26] methods. Earlyholistic approaches [13], [31] used intensity pattern of thewhole face as input and modeled the photometric variationby linear combination of the eigenfaces. These PCA models

956 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 30, NO. 6, JUNE 2008

Fig. 1. Face over different (a) expressions, genders, ages, and (b) scales.

Fig. 2. An illustration of the three-layer face AND-OR graph representation. The dark arrows and shadow nodes represent a composition of seven leafnodes hBrowType2!L=R"; EyeType3!L=R"; SkinType1; NoseType2;MouthType1i, each being a subtemplate at the medium-resolution layer. Thisgenerates a composite graphical template (at the bottom) representing the specific face configuration with the spatial relations (context) inheritedfrom the AND-OR graph.

∆ 1N

∆ 1A

∆ 2N

∆ 2A

∆ 3N

∆ 3A

Fig. 1.5 A three-layer And-Or Graph (AOG) representation for face [21]. Dark arrows representthe paths for generating an instance from the AOG, by decomposing And-nodes (solid ellipses) intosub-components and choosing among alternative types at Or-nodes (dashed ellipses). As markedon the right side, examples in ∆N and ∆A of different scales can be embedded in the AOG. c©IEEE. ZZ:fig:XuAOG

cannot efficiently account for the geometric deformation andrequire images to be well aligned. Some later work separatelymodeled the shape and texture components of faces, e.g., theAAM [8], [35] and Morphable Models [16], [2]. Although thesewell-known methods captured some geometric and photo-metric variations, they are limited from handling large-scalestructural variations due to the linear assumption and fixedtopology. To relax the global constraint, some component-based/structural models were presented, including thePictorial Model [10], Deformable Templates [40], ConstellationModel [34], and Fragment-based Model [32]. These models firstdecompose faces into parts in supervised or unsupervisedmanners, then the intensity patterns of parts are modeledindividually, and the spatial relations among parts weremodeled jointly. In addition, there are some hybrid methods[14], [26], which corporate the global and local information toachieve better results. However, in spite of the greaterstructural flexibility over the global methods, these modelshave their own limitations: 1) in contrast to the hierarchicaltransforms that we observed during the scale/resolutionchanges of face images, the structures of these models are flatand without scale transitions to account for the emergence ofnew features (e.g., marks or wrinkles), 2) the topologies ofthese models are fixed and cannot account for structuralchanges caused by state transitions of the parts (e.g., openingor closing eyes), and 3) the relations among parts are usuallymodeled by global Gaussian or pairwise Gaussians and,therefore, the flexibilities are limited.

To model the scale variabilities, some researchers con-struct a Gaussian/Laplacian pyramid from the input image[20] and encode images at multiple resolutions. Others modeleach object as one point in the high-dimensional feature spaceand increase the dimension to match the augmented complex-ity [21]. Both methods are inefficient and inadequate forhuman faces, where dramatic variabilities are exhibited dueto the absence of feature semantics and lack of structuralflexibility. We thus call for meaningful features that arespecially designed for different scales/resolutions. In anycase, constraints and relations on these features shall beenforced to form valid configurations while still maintainingconsiderable (structural/geometric/photometric) flexibil-ities. Ullman et al. proposed Intermediate Complexity [32] as acriterion for selecting the most informative features. Their

learned image fragments of various sizes and resolutionsincidentally support our use of the three-layer dictionary:faces, parts, and primitives. Similar to the AAM models, eachelement in our dictionary is governed by a number oflandmark points to allow more geometric and photometricvariabilities, where the landmark number is determined bycomplexity of the element. For each part (e.g., mouth), weallow selecting from a mixture of elements (e.g., open orclosed mouth) and enforce the structural flexibility duringstate transitions. In addition, a coarse element expands to asubgraph of finer elements and accounts for the structuralchange during scale transitions. The selections and expan-sions are then implemented using the AND-OR graph model.While the original AND-OR graph was introduced by Pearl asan AI search algorithm [24] (1984), our model is more similarto some recent works by Chen et al. [6] and Zhu and Mumford[43]. The AND-OR graph that we use is shown to be equivalentto a Context Sensitive Grammar (CSG) [28], which integrates theStochastic Context Free Grammar (SCFG) [11] and MarkovRandom Field (MRF) [42] models.

With the ability to represent large structural variations andcapture rich facial details, our model facilitates the generationof facial sketches for face recognition [37] and nonphotor-ealistic rendering [18], [36]. Supported by psychology studies[4], it is known that sketch captures the most informative partof an object, in a much more concise and potentially robustrepresentation (e.g., for face caricaturing, recognition, orediting). Related work includes [29] and [5]. The formerrenders facial sketches similar to high-pass filtered images bycombining linear eigensketches and does not provide any high-level description of the face. Constrained on an Active ShapeModel (ASM) [7], the latter generates facial sketches bycollecting local evidences from artistic drawings in thetraining set and lack of structural variations and facial details.

1.4 Our Contributions and OrganizationWe present a hierarchical compositional graph model forrepresenting faces at multiple resolutions (low, medium,and high) and large variations (structural, geometric, andphotometric). Our model parses the input face images ofgiven resolutions by traversing the constructed AND-ORgraph and drawing from the multiresolution templatedictionaries. The traversals are guided by the SG andMDL criterion. Our hierarchical-compositional model,

XU ET AL.: A HIERARCHICAL COMPOSITIONAL MODEL FOR FACE REPRESENTATION AND SKETCHING 957

Fig. 3. A face is parsed into the configuration of the local parts and skin zones, of which both the images and symbolic representations are shown.

Parts and skin zones can be further parsed into subgraphs of image primitives.

Fig. 1.6 A parse graph instantiated from the AOG in Fig. 1.5 [21]. This process essentially gener-ates a portrait sketch by selecting sketches of different parts and composing them. c©IEEE. ZZ:fig:XuPG


to preserve the fidelity both globally and locally (c.f. the global/local hybrid modelintroduced by Chen et al. [2]).

Compared with the flat model described in the previous section, the hierarchicalAOG has two advantages on the aspect of the face fidelity factor.

1. The AOG can encode spatial constraints more efficiently with the And-nodes atmultiple levels on the graph. Furthermore, if multiple selections at Or-nodes areclustered and similar ones are merged, the AOG compresses the storage of alltraining examples, which is important when the training set is large.

2. The hierarchical AOG structure makes it easier to enforce likeness for differentfacial parts at multiple resolutions, especially when we want different weights forthe likenesses at different parts/levels. For example, the likeness of the eyes couldbe more important than that of the eyebrows, while the appearance of an entireeye might be more important than those of the eyelids at an unnecessarily highresolution. In this sense, the power of the face AOG has yet to be fully developedbefore comprehensive psychological studies on this topic are carried out.

As for the artistic style factor, Xu et al. [20, 22, 21] still used sketch examples cre-ated by artists. To improve the visual effects, Min et al. [14] collected more stylishtraining examples in ∆A and added two sub-systems for processing the hair andclothes, respectively. Figure 1.7 includes an example generated by their system.Lecture Notes in Computer Science: Authors’ Instructions 11

(a) (b) (c) (d) (e)

Fig. 10. Multiple style rendering. (a) Input image; (b) The portrait of primitive sketch;(c) The portrait of literary sketch; (d) The portrait of pencil sketch; (e) The coloredportrait.

p(S|I;¢S)

=p(Sh|I;¢hS) · p(Scp|I;¢cp

S ,¢cpI ) · p(Sc|I;¢c

S)

=p(Sh|I;¢hS) ·

5Y

j=1

p(lj) ·5Y

j=1

p(Scplj

|I;¢cpS ,¢cp

I ) · p(Sc|I;¢cS)

(12)

It is easy to generate another style by replacing the ¢S with another sketchdictionary ¢S0 . Additionally, we can get a colored portrait by tinging the regionof each portrait component. The colored portrait is more expressive. We can seethese eÆect in Figure 10. With a larger sketch dictionary, we can generate morestyles.

4 Experiments

To verify the framework we proposed, experiments were conducted based on225 frontal face images chosen from diÆerent genders, ages and races. Someimages are taken from the AR data set [23]. All the images are resized to thesame resolutions: 500£ 500. We take 125 images for training and 100 images fortesting. The training data satisfy following prerequisites:

1. Frontal view only(no hats and glasses).

Fig. 1.7 An example portrait sketch generated by the system of Min et al. [14] c©Springer. ZZ:fig:MinResults

1.2.3 Sketching the Facial SurfaceZZ:sec:surfacesketch

In addition to the concise sketching styles discussed in Sections 1.2.1 and 1.2.2,there is another popular sketching style which pays more attention to the facial sur-face than the boundaries and salient edges/curves. This style depicts the appearanceof facial surface affected by illumination and shading effects. There are a few studieson this type of portrait sketch in the NPR literature.

Wang and Tang [19] proposed an example-based method for synthesizing portraitsketches with surface strokes to depict the facial appearance. Figure 1.8 displays theframework of their method.


SourcePhoto

P

GraphMatching

ShapeComponent

Gp

Eigentransformationfor Shape

SketchShape

Gs

TextureComponent

Ip

Eigentransformationfor Texture

SketchTexture

Is

FinalSketch

S

1Fig. 1.8 Framework of the portrait sketch system developed by Wang and Tang [19]. ZZ:fig:WangTangFramework

In this method, a triangular mesh is attached to the face, whose vertices are lo-cated at the fiducial points on the face, for example, the eyeballs, the corners of themouth, etc. Given a training set containing pairs of portrait photographs and theircorresponding sketches created by artists, the fiducial points are located for eachimage (both photographs and sketches), and the corresponding triangular mesh isconstructed for the pair. Then two eigentransformations from photograph to sketchare computed for shape (coordinates of mesh vertices) and texture (grayscale im-ages), respectively. For rendering a given input photograph, its fiducial points andtriangular mesh are first computed, and then these are used to warp the image tothe mean shape as derived from the training set. This essentially separates the shapecomponent Gp and the texture component Ip, which then pass through their respec-tive eigentransformations we computed before to generate the sketch shape Gs andthe sketch texture Is. After warping Is back to the original shape of the input, weobtain the final sketch.

Although the two differ in detailed models and algorithms for shape and texture,the global design of the method by Wang and Tang [19] is essentially very similarto that of Chen et al. [2]. They both try to separate the shape and texture informa-tion, and each defines certain mapping relationships for both components in orderto transfer a photograph into a sketch image.

• For shape, Chen et al. adopted ASM, while Wang and Tang used a triangularmesh.

• For texture, Chen et al. used a non-parametric sketch model dependent on thephotograph, while Wang and Tang used an eigentransformation in the imagespace.

Therefore, the face fidelity and artistic style factors are implemented here in a sim-ilar way to the flat model of Chen et al. but with different details. The fidelity isenforced using the triangular mesh attached to the face with shape and texture eigen-transformations, and the artistic style is defined by the sketch examples created byartists.

Another interesting work on sketching the facial surface was presented by Tressetand Leymarie [18]. In this work, sketches are drawn randomly as Bezier curves atdifferent densities for each color-clustered region in the segmented face area. In thisway, the face fidelity factor is only roughly preserved at a low-resolution level interms of approximate grayscale levels of different regions and is dropped for most


details. Meanwhile, this random sketching process, defined in its style ∆A, producesa unique appearance which may appear aesthetic to people.

1.3 Paper-CutZZ:sec:papercut

The second portrait genre we review in this chapter is the paper-cut, which is es-sentially a binary image called a Mooney image in the psychological literature [8].The binarization process is inhomogeneous for different facial areas, which usuallydoes not correspond strictly to the grayscale levels. While there have been a fewstudies on the paper-cut in the NPR literature, to the best of our knowledge, the onlydedicated portrait paper-cut work is by Meng et al. [13].

leyebrow reyebrow

reye

mouthnose

rlaughline

reyelineleyeline

llaughline

leye

contour

skin

leyebrow reyebrow

reye

mouthnose

rlaughline

reyelineleyeline

leye

contourskin

llaughline

Fig. 1.9 Parse graph of a face image and its corresponding paper-cut [13]. ZZ:fig:MengPG

The method of Meng et al. adopts the hierarchical AOG face representation intro-duced in Section 1.2.2.2, shown in Fig. 1.9. Its global design is very similar to thatof Min et al. [14], except for a few differences in detailed algorithms. Further differ-entiating the two, sketch examples in the latter are replaced by paper-cut versions inthe former, as shown in Fig. 1.10. Meng et al. also used separated sub-systems forthe face, the hair and clothes.

Figure 1.11 includes two results generated by the system of Meng et al. Thestructural information of the face is represented by the AOG model (the fidelityfactor), and the style factor and aesthetic rely on the paper-cut examples in ∆A thatwere created by artists.

Meng et al. [13] is thus far the only work that studied the trade-off betweenfidelity and aesthetic in the rendering process. In this work, fidelity is achieved bya binary proposal generated from the source image using dynamic thresholding, asshown in Fig. 1.12, and the aesthetic level is controlled by the compatibility level ofthe facial parts in the template dictionary ∆A. To obtain a trade-off between the two,


Fig. 1.10 Paper-cut graphical elements in ∆A for facial parts used in the system of Meng et al. [13] ZZ:fig:MengDelta

Fig. 1.11 Example paper-cuts generated using the method of Meng et al. [13] ZZ:fig:MengResults

…

…

...

Fig. 1.12 The dynamic thresholding algorithm for computing the binary proposal used by Menget al. [13] ZZ:fig:MengBin

a weighted sum of the cost function of each,

d +λc, (1.3)

is measured, where d is the difference between the template image and the binaryproposal, and c is the number of template instances that the facial parts are selectedfrom, which is a metric of compatibility (assuming parts are more compatible ifthey are from the same template). By tuning λ it is possible to obtain a continuousspectrum from pursuing only likeness, to considering a weighted combination, thento pursuing only aesthetic.


1.4 Oil-PaintingZZ:sec:oilpainting

Besides sketch and paper-cut, another type of visual art in which portraiture plays animportant role is oil-painting. Portrait painting has also been studied in the artisticrendering literature.

Zhao and Zhu [24] developed a system for rendering portrait paintings from pho-tographs using active templates. The main idea of this system is similar to manymethods introduced above on sketching. However, due to the much more detailedappearance in paintings than in sketches, the algorithms for depicting these detailsare especially important for both likeness and aesthetic.

The system of Zhao and Zhu has a few crucial components:

• A dictionary of portrait painting templates. Given photographs, artists are askedto paint portraits on a screen with a digitizer, using image-example-based brushstrokes as shown in Fig. 1.13. The color statistics (mean and variance), ge-ometry (control points of the backbone curve), and texture (example ID in thebrush dictionary [23]) of each stroke used by the artists to compose the portraitare recorded. This gives the complete information about the sequence of brushstrokes for each portrait painting, as shown in Fig. 1.14b (only part of the strokesare visualized). A dictionary of these portrait painting examples, along with theircorresponding photographs, is constructed as shown in Fig. 1.15. The elementsof the dictionary are used as templates later in rendering.

• A representation of the spatial configuration of the face and the brush strokes.For each instance in the dictionary, the shape of the face is captured using anactive appearance model (AAM) [5] with 83 landmarks. The positions of theselandmarks are put in the same coordinate system as the control points describingthe geometry of the brush strokes. Shape matching between two different faces(e.g., an input photograph to paint from and a reference example in the dictio-nary) are achieved by computing a Thin Plate Spline (TPS) transformation [1]between the coordinate pairs of their AAM landmarks.

• A brush stroke rendering algorithm. Given an input face photograph, in order tosynthesize a corresponding portrait painting, we first select a reference examplefrom the dictionary of templates. Then we compute the TPS transformation fromthe shape of the reference example to that of the input photograph. Afterward,this TPS transformation is applied to the control points of all brush strokes in thereference example, and the output control points with new coordinates defines thenew geometry of the strokes for composing the portrait corresponding the inputphotograph, as shown in Fig. 1.14c. Finally, the color of each brush stroke istransferred to match the color of the target photograph, and the brush strokes aresuperimposed to compose the result painting image, as shown in Figs. 1.14d,e.

Figure 1.16 displays example results generated using Zhao and Zhu’s system. Inthis system, the face fidelity factor is taken care of by the AAM landmarks and afew algorithms in the portrait painting system:

• Template selection. For a given input photograph, the top-10 best matched exam-ples from the dictionary in terms of both shape and appearance are reported for


(a) original brush stroke (b) morphed brush stroke

Fig. 1.13 The example-based brush stroke model used by Zhao and Zhu [24]. ZZ:fig:Zhao11brush

(a) (b) (c) (d) (e)

Fig. 1.14 Pipeline of the portrait painting system of Zhao and Zhu [24]. See explanations in text. ZZ:fig:Zhao11pipeline

Fig. 1.15 Example portrait painting templates from the dictionary of Zhao and Zhu [24]. ZZ:fig:Zhao11dic

Fig. 1.16 Example resultsgenerated using Zhao andZhu’s system [24]. ZZ:fig:Zhao11Results


selection. This avoids using templates differing too much from the target image,which may potentially cause problems in likeness and rendering.

• Shape matching and stroke deformation. Shapes of different faces are matchedthrough the 83 AAM landmark points. Using stroke deformation defined by thematching between landmarks ensures that the strokes are rendered at appropriatepositions with correct curvatures to depict the facial surface and parts.

• Stroke color transfer, which maps the colors of all strokes to the target photographin a coherent way. This contributes crucially to preserving the global appearanceof the original photograph.

The artistic style factor and the aesthetic is supported by the portrait paintingdictionary, including:

• The sequence of sparse but decisive and colorful strokes which, when put to-gether, convey an impression of 3D structures and vibrant contrasts, and

• The individual textured brush strokes, which deliver elegant oil-painting detailswith illumination and shading.

1.5 CaricatureZZ:sec:caricature

Caricature is a very special type of portrait image. It differs from traditional sketch,paper-cut and painting by manipulating the two factors we study in this chapter.Most caricatures explicitly trade a certain degree of fidelity for unique aestheticeffects, for example, by exaggerating features in several parts of the face.

Tominaga et al. [16, 17] developed the PICASSO facial caricaturing system. Inthe PICASSO system, 445 characteristic points are located on the edges of facialparts, and various expressions are defined as offsets of these points from a meanface (without expression). The idea of deliberately manipulating the facial shape forcaricaturing and further exaggeration is the basis of most later studies on portraitcaricatures.

Liang et al. [12], whose framework is shown in Fig. 1.17, extended the sketch-ing method of Chen et al. [3] with exaggeration for generating facial caricatures. Inits prototype-based shape exaggeration model, the training examples are analyzedand clustered into a few exaggeration prototypes. Each prototype represents a trendof exaggeration in some facial features, towards which the sketch image is warpedusing a local linear model at runtime to generate caricatures. The additional exag-geration module extending the system of Chen et al. essentially reduces the level offace fidelity while enhancing features exaggerated in training examples, reflecting atype of aesthetic.

Mo et al. [15] noticed that the exaggeration level should depend not only onthe absolute difference from the mean shape, but also the variance level of suchdifferences among examples. This idea enables the comparison between the exag-geration levels of different facial parts, which corresponds better to our perception.Figure 1.18 displays two of their results.


Training Photosand Sketches

Photo Landmarks {Si}Sketch Landmarks {S′i}

Labeled Data

Prototype-basedShape Exaggeration

Model

Input Image

Landmarks forExaggerated

Shape S′

UnexaggeratedSketch T ′

Final ResultExample-basedSketch System

Landmarks S

TrainingRunning

ASMSearch Shape

Exaggeration

1Fig. 1.17 Framework of the example-based caricature generation system of Liang et al. [12] ZZ:fig:Liang02framework

Fig. 1.18 Two example re-sults generated using themethod of Mo et al. [15]From left to right: inputphotographs, photorealis-tic caricatures, and stylizedcaricatures. c©ACM.

Improved Automatic Caricature by Feature Normalization and Exaggeration Zhenyao Mo* J.P. Lewis† Ulrich Neumann‡

Integrated Media Systems Center University of Southern California

1 Introduction

This sketch presents an improved formalization of automatic caricature that extends a standard approach to account for the population variance of facial features. Caricature is generally considered a rendering that emphasizes the distinctive features of a particular face. A formalization of this idea, which we term “Exaggerating the Difference from the Mean” (EDFM), is widely accepted among caricaturists [Redman 1984] and was first implemented in a groundbreaking computer program by [Brennan 1985]. Brennan’s “Caricature generator” program produced caricatures by manually defining a polyline drawing with topology corresponding to a frontal, mean, face-shape drawing, and then displacing the vertices by a constant factor away from the mean shape. Many psychological studies have applied the “Caricature Generator” or EDFM idea to investigate caricature-related issues in face perception [Rhodes 1997]. However, the EDFM method may not produce the best caricatures. As shown in Fig 1, the distinctiveness of a displaced feature not only depends on its distance from the mean, but also its variance (shown as circles in the figure). For example, the width of the mouth is much more widely distributed than the width of eyes. Thus, a mouth 2cm wider than the mean may still look normal, whereas eyes 2cm wider than the mean will be very distinctive. In the EDFM method, however, both the mouth and eye width will be emphasized by a same factor because their DFMs (Difference-From-Mean) are the same.

This sketch describes a method to produce caricatures based on both feature DFMs and feature variance. 2 Method

Three-hundred frontal faces from the FERET database were selected as training examples to learn a “face space”; on each face, 94 points were hand-labeled to represent the shape. A 188x300 shape matrix S is constructed, a column of which consists of x and y coordinates of a training shape. Non-negative matrix factorization [Lee and Seung 1999] is applied to the S to learn the face space dimensions:

S = F * E Each dimension consists of a basis vector fi (the i-th column of matrix F) and its distribution (expectation mi and standard deviation !i of the i-th row in matrix E). Each dimension represents an abstract facial feature.

*email: [email protected] †email: [email protected] ‡email: [email protected]

Now given a new face photograph, the shape s can be automatically located with an Active Appearance Model in some cases (frontal face, simple lighting). If the AAM fails, the shape points are manually placed, which takes several minutes at most. The shape s is then represented in the face space as a non-negative linear combination of the basis vectors and a residual:

¦¦ +"+=+"=i

iiii

ii rfmrfes ***** )( #

The shape is exaggerated by scaling the DFM by a factor k on the dimensions where |||| iii me $=# is larger than i!"2 :

¦ ""+""+=i

iiii rkftms *** 5.0)(' #

We set ti=1 if ii !# "< 2|| and ti=k if ii !# "% 2|| . The residual r may consist of distinctive features that cannot be represented in the face space and/or noise; thus, r is also exaggerated, but with a reduced scale.

A photo-realistic caricature is produced by image warping from original shape s to the exaggerated shape s&. Further applying Non-Photorealistic Rendering techniques generates a stylized line-drawing caricature. Some results are shown in Fig 2.

Figure 2: results. Left, original faces; middle, photorealistic caricatures; right, stylized caricatures.

References Brennan, S. E. 1985. Caricature generator: the dynamic exaggeration of faces by computer. Leonardo 18, 3, 170-178.

Lee, D-D. and Seung, H-S. 1999. Learning the parts of objects by non-negative matrix factorization. Nature 401, 788-791.

Redman, L. 1984. How to draw caricatures. Contemporary Books.

Rhodes, G. 1997. Superportraits. Psychology Press.

A B

Figure 1: features A and B aremoved the same distance fromtheir means, but when theirdistribution is considered, A isdistinctive whereas B is not.

ZZ:fig:Mo04

Fig. 1.19 An example resultgenerated using the caricaturealgorithm of Chiang et al. [4]c©Asian Federation of Com-

puter Vision Societies.

2. Size adjustment: The size of the bounding box of the exaggerated shape is used to shrink or expand the component as a whole.

3. Final node placement: Spatial relationship among the face components has to be considered to ensure correct placement. This is achieved by placing the nodes with higher ranks first. The master point of each group is taken as the basis for detecting overlaps or collisions as a result of magnification or excessive shift. Other points in the group shift accordingly with the master node.

4.2. Image Metamorphosis Caricature generation is formulated as a warping process in our approach. A feature-based image metamorphosis method is employed [10]. To begin with, an artist’s work is selected as the source image, whose feature nodes have previously been labeled. For every input image, feature extraction and exaggeration rate estimation is performed as described in Section 3. We then use the exaggeration rate for each component to adjust the positions of the nodes, from high rank to low rank ones. A modified face mesh thus generated will contain the destination nodes for the warping process. If a different style of caricature is preferred, simply choose another caricaturist’s work as the source image.

(a) (c) (e)

(b) (d) (f)

Figure 10. (a) A subject with the original face mesh overlaid. (b) Modified node positions after shape exaggeration. (c) An artist’s work. (d) Caricature generated based on (c). (e) Another artist’s work. (f) Caricature generated based on (e).

5. ILLUSTRATIONS Figure 11 presents the caricature drawings of three different subjects using the same source image. Since the same artist’s caricature is used, all drawings share a common style. Different parts of the face, however, are emphasized to various degrees. For example, the eyes of

subject in Fig. 11a are relatively small and have a unique orientation. These properties are amplified in the generated caricature. Similarly, the subject in Fig. 11c has large eyes and full lips, resulting in a caricature that further emphasizes these traits.

(a) (b)

(c) (d)

(e) (f)

Figure 11. (a)(c)(e) Original input images. (b)(d)(f) Corresponding caricatures.

6. CONCLUSIONS We have introduced an efficient approach for generating caricature drawings using only one existing caricature as the reference. An organized way for defining and estimating the face components and the associated parameters has been developed. By combining effective feature analysis with the proposed face model, caricature generation is converted into a simple image warping process.

The current prototype considers only the spatial relationship among the groups. Future system will also pay attention to the relative node position within each group to achieve more accurate shape definition. In addition, skin texture analysis and hair style classification will be included to enable the production of more appealing caricatures.

ZZ:fig:Chiang04


Another interesting work on caricature was presented by Chiang et al. [4], whichadds exaggeration upon a mesh-based representation for facial geometry (simi-lar to Wang and Tang [19]) instead of the contour-based ones used by Tominagaet al. [16, 17] and Liang et al. [12]. This enables more details in color and texture inthe rendered caricature, as shown in Fig. 1.19.

1.6 SummaryZZ:sec:summary

In this chapter, by reviewing recent work on artistic rendering of portraits, we havestudied its two essential factors, namely, the face fidelity factor supported by certainface representations stored in WA, and the artistic style chosen to simulate differ-ent media with graphical elements and rendering details defined in ∆A and g, asintroduced in Section 1.1.

• Fidelity is usually enforced at two different levels. At the local level of details offacial parts, certain similarity measures are adopted between the photograph andthe artistic depiction or implicitly applied by defining a mapping/transformationbetween them. At the global level of the face, certain shape models are adopted,such as ASM, AAM or triangular mesh, to constrain the spatial configurationof the facial parts. Additionally, the And-Or Graph model has been applied tointegrate the two or more levels in a single hierarchy.

• The artistic style/format is usually applied by using training-example art piecesor elements created by artists, in addition to some simple rule-based strategies forstraightness, smoothness, etc. The unique style of caricature is mainly attributedto the exaggerated features which are also learnable from training examples.

Guided by the two factors, various methods in past studies on artistic rendering ofportraits were used to pursue likeness and aesthetic of portraiture.

Despite the progress, a few key questions still remain standing in our way to asystematic theory and solution to the portrait rendering problem:

• How should we select the important information from WN to be used in WA? Inpractice, this means choosing a facial model that captures features essential toartistic perception while ignoring unimportant parts. Also, what is the minimuminformation needed in WA to satisfy the likeness principle?

• Is there a principled method for balancing between likeness and aesthetic, orcan these two be separated for independent manipulation under certain circum-stances? The study of Leopold et al. [9] gives us some hints on this question,indicating that extrapolation against a mean face can preserve a person’s iden-tity without confusing with other people in the dataset. This supports the carica-ture rendering methods based on shape exaggeration which improve the aestheticwithout giving up the likeness. Meanwhile, a comprehensive investigation to thisproblem has yet to be conducted.


AcknowledgmentsZZ:sec:acknowledgments

We would like to thank Meng Meng, Jinli Suo and Yaling Yang for discussionsand help with the experimental data and figures during studying the likeness andaesthetic principles in portrait rendering, and Amy Morrow for suggestions on thepresentation of this chapter. This work has been supported by ONR MURI GrantN000141010933 and DARPA Award FA 8650-11-1-7149.

References

Barrodale93 1. Barrodale, I., Skea, D., Berkley, M., Kuwahara, R., Poeckert, R.: Warping digital images usingthin plate splines. Pattern Recog. 26(2), 375–376 (1993)

Chen04 2. Chen, H., Liu, Z., Rose, C., Xu, Y., Shum, H.Y., Salesin, D.: Example-based composite sketch-ing of human portraits. In: Proceedings of NPAR (2004)

Chen01 3. Chen, H., Xu, Y.Q., Shum, H.Y., Zhu, S.C., Zheng, N.N.: Example-based facial sketch gener-ation with non-parametric sampling. In: Proceedings of ICCV (2001)

Chiang04 4. Chiang, P.Y., Liao, W.H., Li, T.Y.: Automatic caricature generation by analyzing facial fea-tures. In: Proceedings of ACCV (2004)

Cootes01 5. Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. IEEE Trans. PatternAnal. Mach. Intell. 23(6), 681–685 (2001)

Cootes95 6. Cootes, T.F., Taylor, C.J., Cooper, D.H., Graham, J.: Active shape model — their training andapplication. Comput. Vis. Image Understand. 61, 38–59 (1995)

Gooch01 7. Gooch, A., Gooch, B.: Non-Photorealistic Rendering. A K Peters (2001)Hegde07 8. Hegdé, J., Thompson, S., Kersten, D.: Identifying faces in two-tone (‘Mooney’) images: A

psychophysical and fMRI study. J. Vision 7(9), Article 624 (2007)Leopold01 9. Leopold, D.A., O’Toole, A.J., Vetter, T., Blanz, V.: Prototype-referenced shape encoding re-

vealed by high-level aftereffects. Nat. Neurosci. 4(1), 89–94 (2001)Li95 10. Li, Y., Kobatake, H.: Extraction of facial sketch images and expression transformation based

on FACS. In: Proceedings of ICIP (1995)Li97 11. Li, Y., Kobatake, H.: Extraction of facial sketch image based on morphological processing.

In: Proceedings of ICIP (1997)Liang02 12. Liang, L., Chen, H., Xu, Y.Q., Shum, H.Y.: Example-based caricature generation with exag-

geration. In: Proceedings of Pacific Graphics (2002)Meng10 13. Meng, M., Zhao, M., Zhu, S.C.: Artistic paper-cut of human portraits. In: Proceedings of

ACM MM (2010)Min07 14. Min, F., Suo, J.L., Zhu, S.C., Sang, N.: An automatic portrait system based on and-or graph

representation. In: Proceedings of EMMCVPR (2007)Mo04 15. Mo, Z., Lewis, J.P., Neumann, U.: Improved automatic caricature by feature normalization

and exaggeration. In: SIGGRAPH Sketch (2004)Tominaga97 16. Tominaga, M., Fukuoka, S., Murakami, K., Koshimizu, H.: Facial caricaturing with motion

caricaturing in PICASSO system. In: Proceedings of the IEEE/ASME International Confer-ence on Advanced Intelligent Mechatronics (1997)

Tominaga98 17. Tominaga, M., Hayashi, J.I., Murakami, K., Koshimizu, H.: Facial caricaturing system PI-CASSO with emotional motion deformation. In: Proceedings of the 2nd International Confer-ence on Knowledge-Based Intelligent Electronic System (1998)

Tresset05 18. Tresset, P., Leymarie, F.: Generative portrait sketching. In: Proceedings of VSMM (2005)Wang03 19. Wang, X., Tang, X.: Face sketch synthesis and recognition. In: Proceedings of ICCV (2003)Xu05 20. Xu, Z., Chen, H., Zhu, S.C.: A high resolution grammatical model for face representation and

sketching. In: Proceedings of CVPR (2005)


Xu08 21. Xu, Z., Chen, H., Zhu, S.C., Luo, J.: A hierarchical compositional model for face representa-tion and sketching. IEEE Trans. Pattern Anal. Mach. Intell. 30(6), 955–969 (2008)

Xu07 22. Xu, Z., Luo, J.: Accurate dynamic sketching of faces from video. In: Proceedings of CVPR(2007)

Zeng09 23. Zeng, K., Zhao, M., Xiong, C., Zhu, S.C.: From image parsing to painterly rendering. ACMTrans. Graph. 29(1), 2:1–2:11 (2009)

Zhao11 24. Zhao, M., Zhu, S.C.: Portrait painting using active templates. In: Proceedings of NPAR (2011)

chapter 1 artistic rendering of portraits artistic rendering of portraits 3 to preserve the facial...

Documents