smoothness-constrained face photo-sketch · pdf filesmoothness-constrained face photo-sketch...

Smoothness-Constrained Face Photo-Sketch Synthesis Using SparseRepresentation

Liang Chang 1, Xiaoming Deng 2, Mingquan Zhou 1, Fuqing Duan1, Zhongke Wu 1

1 College of Information Science and Technology, Beijing Normal University, China2 Institute of Software, Chinese Academy of Sciences, China

Emails: {changliang, mqzhou}@bnu.edu.cn, [email protected]

Abstract

Face photo-sketch and sketch-photo synthesis haveimportant usages in law enforcement. It is challeng-ing to synthesize face sketches from photos because thedrawing techniques and styles of artists’ depictions arehard to be learned. To synthesize face photos fromsketches is also hard due to its ill-posed nature. In orderto avoid mosaic effects in the existed photo-sketch meth-ods, we propose a smoothness-constrained photo-sketchsynthesis method via sparse representation. The workis an extension of the previous work[1]. The method ismodeled as the minimization of an energy function, alarge scale convex optimization problem with l1-normconstraint. Since previous optimization methods areinfeasible to solve our problem, we propose an itera-tive optimization approach, which decomposes the largescale optimization into a sequence of small scale opti-mizations and solve them iteratively to obtain the ap-proximated optimal solution. The same synthesis strat-egy can be also used to synthesize photos from sketches.Experiments show its effectiveness.

1 Introduction

Face photo-sketch and sketch-photo synthesis have awide range of applications for law enforcement [4, 8].In many scenarios of law enforcement, traditional facerecognition based on photos is infeasible because theface photo of a suspect is unavailable. Only the facesketch of the suspect is available according to the de-scription of the wittiness [4]. Researchers proposedtwo approaches for identification: (a) sketch based faceretrieval; (b) photo based face retrieval. In the firstapproach, face sketches synthesis provides an alterna-tive to the sketch database collection; In the second ap-proach, the synthesized face photo of the suspect is usedfor recognition. Compared with face photos, sketchesare more concise and discriminative [8].

Several studies have been conducted on sketch syn-thesis. Tang [6] developed an eigentransform basedalgorithm, in which the transformation between pho-tos and sketches is assumed to be linear. Liu [4] pre-sented a method which is similar in the spirit to LLE[5]. This method needs a carefully chosen of the num-ber of nearest neighbors. Wang [8] proposed a method

using a multiscale Markov random field model, whichis relatively time-consuming due to use an inferenceprocedure with belief propagation as reported in [8].Many other algorithms are proposed to synthesize facesketches, such as Zhang [11] proposed a sketch synthe-sis by using SVM. In recent years, sparse representationhas been regarded as a breakthrough in signal process-ing and pattern recognition, it has been applied widelyin various computer vision tasks [9][10]. In the previ-ous work [1], we proposed to synthesis face sketch viasparse representation. Sparseness is desired due to itssuccinct representation ability and its discriminative na-ture. Another merit of the method is that it has smallerstorage requirement due to the fact that it only requiresa much smaller succinct dictionary by sparse coding.

In this paper, we propose a smoothness-constrainedface photo-sketch synthesis method, which has the ad-vantage that it alleviates mosaic effects of synthesizedsketches. The method can be outlined as: firstly, thetraining photos and sketches are divided into overlappedpatches. In each patch, photo and sketch patch pairs areused to built a coupled dictionary[3]; Secondly, for eachimage patch in the test photo, we compute its sparse rep-resentation coefficient with respect to the photo basesin coupled dictionary. The sketch patch is recoveredwith the same coefficient and the sketch bases in cou-pled dictionary (Section 2). The coefficient is used asthe initialization of the sketch synthesis results; Thirdly,smoothness-constrained face photo-sketch synthesis ismodeled as minimization of an energy function (Sec-tion 3), which is usually a large-scale convex optimiza-tion problem with l1-norm constraint, and then we pro-pose an efficient decomposition algorithm to solve it byusing a series of small scale convex optimizations. Ex-periments show that the results with our method anda state-of-the-art method [8] resemble the sketches byartists well.

2 Initialization of Face Photo-Sketch Syn-thesis

Face photo-sketch synthesis is initialized via sparserepresentation[1]. Given a face photo P as input, in thissection, we give an initialization to its correspondingface sketch S using sparse representation.

21st International Conference on Pattern Recognition (ICPR 2012)November 11-15, 2012. Tsukuba, Japan

978-4-9906441-0-9 ©2012 ICPR 3025

2.1 Dictionary Preparation

We use sparse coding to build the local coupled dic-tionaries for sketch patches and photo patches. De-note pi to be the i-th photo patch in the test photo, sito be the desired sketch patch. Since the same facecomponent is roughly in the same region of photosand sketches due to geometry alignment, we divide thetraining face photos and sketches into a set of over-lapping patches {pi}n

i=1 and {si}ni=1 by scanning the

whole image (See Fig. 1). For each image patch, acoupled dictionary is built using image patches withinthe local region of the training photo and sketch set.Instead of building dictionaries on raw image patches,we use the sparse coding [3] to get a succinct dictio-nary. We assume that the sparse linear relationship of agiven photo patch with respect to the photo bases of thecoupled dictionary are maintained for the correspond-ing sketch patch with respect to sketch bases.

Sparse coding is an unsupervised learning algorithmfor discovering concise high-level basis vectors us-ing a large number of unlabeled data[3]. For an in-put data ai, sparse coding discovers basis vector set{b1, . . . ,bm}(i = 1...m). Thus ai can be representedapproximately as linear combination of basis vectors,i.e. ai ≈ b1t1+. . .+bmtm, in which ai relies on a fewbasis vectors, and many linear coefficients are equal tozero. Then the basis vectors are learned by minimizingan energy function, which consists of two items: the re-construction error and l1-norm penalty which can guar-antee that the coefficients to be sparse. The problem issolved by an iterative strategy of two convex optimiza-tion problems.

Denote (DiP , Di

S) to be an local coupled dictionaryof the i-th patches in photo and sketch. Di

P is composedof basis vectors of the i-th photo patch, and Di

S is com-posed of corresponding basis vectors of the i-th sketchpatch. For each photo patch pi, we find its sparse rep-resentation with respect to local photo dictionary Di

P

as pi = DiP αi. According to the assumption, with the

sparse coefficient αi and local sketch dictionary DiS , we

can initialize the sketch patch as si = DiSαi.

2.2 Sparse Representation

Given the photo patch pi, the sparse representationof pi with respect to the local dictionary Di

P can be for-mulated as a constrained l1-norm optimization problemas follows:

αi = arg min ‖αi‖1 s.t. DiP αi = pi, (1)

The minimization problem has the same formulation asLasso for linear model estimations in statistics[2], thusαi can be solved easily with Lasso. Sparseness is de-sired due to its succinct representation ability and itsdiscriminative nature.

We reconstruct an initialized face sketch by stitchingthe estimated sketch patches si = Di

Sαi, and use theobtained αi as the initial value of following face sketchrefinement.

Figure 1. The overlapped image patches of the imagepatch and its adjacent image patches.

3 Face Photo-Sketch Synthesis Method

In this section, we use the texture smoothness con-straint of sketch to refine the initialized results by themethod in Section 2, which can alleviate mosaic effectsof the initialized sketch.

3.1 Face Sketch Refinement

The synthesized sketch in Section 2 usually has mo-saic effects due to the fact that each local patch is inde-pendently synthesized and the local texture smoothnessconstraint is not enforced. We proposed a sketch syn-thesis approach via sparse representation, which is ca-pable of enforcing local texture smoothness constraintand alleviating mosaic effects. An energy function isminimized to solve the problem. The energy functiondenoted as E (See Eq.(2)) consists of three parts: 1) E1measures the difference between the original face photoand the face photo synthesized by sparse coefficient; 2)E2 measures the smoothness of the synthesized sketchpatches with its adjacent patches; 3) E3 is an l1-normregularization term to enforce the sparse representation.The coefficients α1, . . . , αn of the refined face sketchescan be obtained by minimizing the energy function E:

E = E1 + β2E2 + λE3 (2)

E1 =n∑

i=1

‖pi − DiP αi‖2

E2 =n∑

i=1

∑

j∈N (i)

∥∥∥TiDiSαi −TjD

jSαj

∥∥∥2

E3 =n∑

i=1

‖αi‖1

where β2 and λ are regularization parameters to balancethe weights of E1, E2, E3, n is the number of imagepatches in each photo and sketch image, pi ∈ Rd is thei-th photo patch, N (i) is the set of neighboring patchesof pi, Ti and Tj are the matrixes which extract theoverlapped regions of neighboring patches pi and pj ,and Di

Sαi and DjSαj to be the corresponding sketch

patches of si and sj . Denote α = [αT1 , . . . , αT

n ]T , then

3026

the energy function (2) can be transformed into the fol-lowing equivalent problem:

E = ‖X− Dα‖2 + λ‖α‖1 (3)

where α ∈ Rmn×1, m is the number of basis vectorsby sparse coding, and n is the number of patches in animage. In our experiment, the number of image patchesn is about 5000, m is 128, thus the dimension of α isquite high. Optimization of the energy function, whichis a large scale convex optimization problem with l1-norm regularization, is a non-trivial task, and traditionalmethods for l1-norm optimization problem is infeasible.

We propose a decomposition method to solve theproblem. The energy function of (3) is decomposedinto a series of small problems. We divide all the vari-ables in the energy function to be the sets of workingset and non-working set. In each iteration, variables inthe working set is chosen and the corresponding sub-problems are recomputed. The working set is selectedby computing the difference of gray value of the over-lapped regions between each image patch and its adja-cent patches. The difference is measured by the l2-normdifference. For a given threshold ε, if the i-th imagepatch and its adjacent image patches violates at leastone of the following conditions

‖TdFiu−TuFi‖ ≤ ε, ‖TuFid

−TdFi‖ ≤ ε

‖TrFil−TlFi‖ ≤ ε, ‖TlFir

−TrFi‖ ≤ ε

where Fi = DiSαi is the feature vector of the i-th

image patch, Fiu= Diu

S αiu,Fid

= Did

S αid,Fil

=Dil

Sαil,Fir

= Dir

S αirare the feature vectors of its up,

down, left and right neighboring patches, respectively.Then we call the i-th image patch an ε-smooth violat-ing image patch. Here, Diu

S , Did

S , Dil

S , Dir

S are the localsketch dictionaries of the up, down, left and right neigh-boring image patches.

Suppose patch i to be a randomly selected ε-smoothviolating image patch. We update its sparse coefficientby minimizing the following small scale problem

minαi

‖αi‖1

s.t.

||pi − DiP αi||22 ≤ ε2

||TdDiu

S αiu−TuDi

Sαi||22 ≤ ε2

||TuDid

S αid−TdDi

Sαi||22 ≤ ε2

||TrDil

Sαil−TlDi

Sαi||22 ≤ ε2

||TlDir

S αir−TrDi

Sαi||22 ≤ ε2

.(4)

where αiu, αid

, αil, αir

are coefficients of the up,down, left and right neighboring patches of patch i inthe previous iteration, and ε is the upper bound of thenoise term. Here, the coefficients are initialized withthe results Section 2.

The above constrained optimization can be solved by

the following unconstrained optimization problem

minαi

‖pi − DiP αi‖22

+β21‖TdDiu

S αiu−TuDi

Sαi‖22+β2

2‖TuDid

S αid−TdDi

Sαi‖22+β2

3‖TrDil

Sαil−TlDi

Sαi‖22+β2

4‖TlDir

S αir−TrDi

Sαi‖22 + λ‖αi‖1 (5)

Here, we choose β1 = β2 = β3 = β4 = β, theoptimization (5) can be transformed into

minαi

‖X− Dαi‖22 + λ‖αi‖1 (6)

where

D = [(DiP )T , β(TuDiu

S )T , β(TdDid

S )T ,

β(TlDil

S )T , β(TrDir

S )T ]T

X = [pTi , β(TdDiu

S αiu)T , β(TuD

id

S αid)T ,

β(TrDil

Sαil)T , β(TlDir

S αir)T ]T

The coefficient αi can be updated by Lasso [2].Proposition 1 If αi is updated by Eq.(6), then the en-ergy function (2) with the updated αi decreases.Proof. Denote Et to be the energy function (2) at the t-th iteration, αt

i, αtiu

, αtid

, αtil, αt

irare coefficients in the

t iteration, Eti to be the following energy function

Eti = ‖pi − DP αt

i‖22+β2‖TdDiu

S αtiu−TuDSαt

i‖22+β2‖TrD

il

Sαtil−TlDSαi

t‖22+β2‖TuD

id

S αtid−TdDSαi

t‖22+β2‖TlDir

S αtir−TrDSαt

i‖22 + λ‖αti‖1(7)

and Eti = Et − Et

i . Since αt+1i is obtained by the

minimization of (7) with respect to αti, we have Et+1

i <

Eti . Note that E

t+1

i = Et

i, we have Et+1 = Et+1

i +Et+1

i < Et = Et

i + Eti .

The optimization will be iterated for randomly se-lected ε-smooth violating image patches until no moresuch violating patches exist or it exceeds the predefinedmaximum iteration number.

3.2 Method Summary

Our smoothness-constrained face photo-sketch syn-thesis using sparse representation algorithm can besummarized in Algorithm 1, and the method can alsosynthesize a face photo given a sketch drawn by anartist, by simply switching roles of photos and sketches.Remark 1: A simple method is adopted to enforceinter-patch relationships by averaging the gray valuesin the overlapped area between adjacent patches.

3027

Figure 2. Comparison between the sketch synthesis re-sults. (a) photos; (b) sketches drawn by artists; (c) sketches bymethod [4], which are copied from [8]; (d) sketches by method[8]; (e) initial sketches without smoothness-constraint[1]; (f)refined sketches by our method.

Algorithm 1 Smoothness-Constrained Face Photo-Sketch Synthe-sis Using Sparse Representation

1: Input: the training set of face photos and the face sketches, anda test face photo.

2: Output: a synthesized sketch for the test face photo.3: For each patch i in the training set of photo and sketch, construct

the local coupled dictionary {DiP , Di

S} by sparse coding[3].4: Initialize the sketch image via sparse representation(Section 2).

For each photo patch i, compute its sparse representation coeffi-cient αi with respect to Di

P by Eq. (1), and initialize the sketchpatch sj by Di

Sαi.5: while no ε-smooth violating patch exists or it exceeds the pre-

defined maximum iteration number do6: At iteration t, select for ε-smooth violating image patches in

the sketch.7: if the j-th patch is ε-smooth violating then8: we update αt

j and sj = DjSαt

j by solving the minimiza-tion problem (6);

9: else10: the sparse representation of sj keeps unchanged.11: end if12: end while13: Reconstruct the sketch by stitching the estimated sketch patches.

4 Experiments

In our experiments, a face photo-sketch databasefrom the CUHK student database was used [7]. Thedatabase contains 88 faces for training and 100 faces fortesting. For each face, a sketch by an artist and a phototaken in the front pose are given. The feature vectors ofthe photos and sketches are represented by the gray val-ues inside the corresponding photo and sketch patches.

4.1 Face Sketch Synthesis

In our experiments, the size of all the face and sketchimages are 160 × 120. The size of image patch is7 × 7, and the overlapping area for adjacent patchesis 5 × 7. The regularization parameters β and λ inl1 minimization are set to be 1.0 and 0.1 respectively,and the maximum iteration number in Algorithm 1 isset to 100. In Fig. 2, we compare our method withour previous method [1] and the state-of-the-art meth-ods in [4][8]. The results by our method are close to

the sketches drawn by the artist. Although the synthe-sis of human hair is challenging due to the variation inthe hair style, our method based on local dictionariescan synthesize the hair region well. Our synthesizedsketches are less blurred and cleaner than the methods[4], and the face structure as well as details in sketchesare synthesized well. In Fig.2 (e)(f), we compare thesynthesized sketches with and without the smoothness-constraint. The results without the smoothness con-straint [1] are noisy and have mosaic effect. The resultsare improved greatly with the smoothness-constraint,and our synthesize sketches are cleaner and have muchless mosaic effects. In Fig. 2 (d)(f), the results withour method are compared with the results by the state-of-the-art method [8]. We give more synthesized facesketch results in Fig. 3. Moreover, our method can syn-thesize a face photo with a sketch by switching roles ofphotos and sketches.

5 Conclusion

We propose a face photo-sketch and sketch-photosynthesis method that exploits the nature of sparse rep-resentation as well as the smoothness prior of sketchesand photos. Efforts have been taken to build suc-cinct coupled dictionaries for local image regions andcompute the sparse representation coefficient. In or-der to exploit the local smoothness of synthesized facesketches and photos, we propose an efficient convexoptimization approach to refine the initialized sketchesand photos by solving a sequence of small scale l1-normoptimizations. Our method has advantages in that theface sketches and photos can be synthesized by usingsparse representation as well as the smoothness priorof synthesized images. Experiments show that the ob-tained sketches and photos resemble the true sketchesand photos well.

Figure 3. Face sketch synthesis results using smoothness-constrained face sketch synthesis method. The first row isthe face photos, the second row is the face sketches drawnby artists, the third row is the synthesized face sketches.

3028

6 Acknowledgement

The work is supported by Fundamental ResearchFunds for the Central Universities, National NaturalScience Foundation of China (No. 61005039), NationalNatural Science Foundation of China (No. 61170203).

References[1] L. Chang, M. Zhou, Y. Han, and X. Deng. Face sketch

synthesis via sparse representation. ICPR, 2010.[2] B. Efron, T. Hastie, I. M. Johnstone, and R. Tibshi-

rani. Least angle regression. The Annals of Statistics,32(2):407–499, 2004.

[3] H. Lee, A. Battle, R. Raina, and A. Y. Ng. Efficientsparse coding algorithms. NIPS, 2006.

[4] Q. Liu, X. Tang, H. Jin, H. Lu, and S. Ma. A nonlin-ear approach for face sketch synthesis and recognition.Proceedings of CVPR, 2005.

[5] S. Roweis and L. Saul. Nonlinear dimensionality reduc-tion by locally linear embedding. Science, 290:2323–2326, 2000.

[6] X. Tang and X. Wang. Face sketch recognition. IEEETrans. on CSVT, 14:50–57, 2004.

[7] the Chinese University of Hong Kong. Student sketchdatabase. http://mmlab.ie.cuhk.edu.hk/facesketch.html.

[8] X. Wang and X. Tang. Face photo-sketch synthesis andrecognition. IEEE Trans. on PAMI, 2009.

[9] J. Wright, A. Yang, A. Gannesh, S. Sastry, and Y. Ma.Robust face recognition via sparse representation. IEEETrans. on PAMI, 31(2):210–227, 2009.

[10] J. Yang, J. Wright, Y. Ma, and T. Huang. Imagesuper-resolution as sparse representation of raw imagepatches. Proceedings of CVPR, 2008.

[11] J. Zhang, N. Wang, X. Gao, D. Tao, and X. Li. Facesketch-photo synthesis based on support vector regres-sion. ICIP, 2011.

3029

smoothness-constrained face photo-sketch · pdf filesmoothness-constrained face photo-sketch...

Documents