learning a stroke‐based representation for fontsric fonts with constraints, and hu and hersch...

14
DOI: 10.1111/cgf.13540 COMPUTER GRAPHICS forum Volume 00 (2018), number 00 pp. 1–14 Learning A Stroke-Based Representation for Fonts Elena Balashova 1 , Amit H. Bermano 1 , Vladimir G. Kim 2 , Stephen DiVerdi 2 , Aaron Hertzmann 2 and Thomas Funkhouser 1 1 Department of Computer Science, Princeton University, Princeton, NJ, USA {sizikova, abermano, funk}@cs.princeton.edu 2 Adobe Research, Seattle, WA, USA {vokim, diverdi, hertzman}@adobe.com Abstract Designing fonts and typefaces is a difficult process for both beginner and expert typographers. Existing workflows require the designer to create every glyph, while adhering to many loosely defined design suggestions to achieve an aesthetically appealing and coherent character set. This process can be significantly simplified by exploiting the similar structure character glyphs present across different fonts and the shared stylistic elements within the same font. To capture these correlations, we propose learning a stroke-based font representation from a collection of existing typefaces. To enable this, we develop a stroke-based geometric model for glyphs, a fitting procedure to reparametrize arbitrary fonts to our representation. We demonstrate the effectiveness of our model through a manifold learning technique that estimates a low-dimensional font space. Our representation captures a wide range of everyday fonts with topological variations and naturally handles discrete and continuous variations, such as presence and absence of stylistic elements as well as slants and weights. We show that our learned representation can be used for iteratively improving fit quality, as well as exploratory style applications such as completing a font from a subset of observed glyphs, interpolating or adding and removing stylistic elements in existing fonts. Keywords: typography, curves & surfaces ACM CCS: Computing methodologiesShape analysis 1. Introduction Font collections are ubiquitous in design tools ranging from word processors to graphics editors. Although professional typographers perpetually create new and diverse families of fonts, it is not always possible to find a font with desired visual features, and thus designers often have to compromise by selecting a font that does not fit with their graphic message. Designing glyphs that have desired style, aesthetic appeal, co- herence and follow good typographic practices requires substantial expertise and time investment. Even a small edit to one glyph (e.g. make the base of the ‘t’ a little wider) can require subtle updates to many other glyphs to maintain coherence within the font. Since tools are not available to support these edits, font customization is generally inaccessible to all but a few expert font designers. The goal of our work is to describe a part-aware font represen- tation and show how it can be integrated with machine learning techniques to automate font manipulation (see Figure 1). The described system is designed as a first step to assist non-experts in the design of customized fonts, it is not intended to compete with carefully designed high-quality fonts. We observe that priors on glyph structure and stylistic consistency are implicitly en- coded in existing typefaces. We therefore hope to capture this knowledge in a generative model and use it to facilitate font editing, completion, interpolation and retrieval. The first challenge in learning from existing typefaces is that they do not share a common structure. Glyphs are typically represented by closed outline curves composed from primitives such as lines and Bezier curves with no consistency across different fonts even for the same character. One approach to achieve a consistent representation would be to rasterize the glyph to a fixed-size image [USB16], however, this representation loses the advantages of vector graphics and does not allow preserving sharp features of the original glyph geometry. Another option is to represent each glyph’s outline with a poly- line such that endpoints of each line segment are in correspondence across all glyphs [CK14]. Outline-based representation makes it difficult to reason about decorative elements such as serifs that are present only in a subset of glyphs. It also ignores the stroke-based structure of characters which is often consistent across glyphs and c 2018 The Authors Computer Graphics Forum c 2018 The Eurographics Association and John Wiley & Sons Ltd. Published by John Wiley & Sons Ltd. 1

Upload: others

Post on 10-Aug-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Learning A Stroke‐Based Representation for Fontsric fonts with constraints, and Hu and Hersch [HH01] provided a richer set of geometric components to define a parametric typeface

DOI: 10.1111/cgf.13540 COMPUTER GRAPHICS forumVolume 00 (2018), number 00 pp. 1–14

Learning A Stroke-Based Representation for Fonts

Elena Balashova1, Amit H. Bermano1, Vladimir G. Kim2, Stephen DiVerdi2, Aaron Hertzmann2 and Thomas Funkhouser1

1Department of Computer Science, Princeton University, Princeton, NJ, USA{sizikova, abermano, funk}@cs.princeton.edu

2Adobe Research, Seattle, WA, USA{vokim, diverdi, hertzman}@adobe.com

AbstractDesigning fonts and typefaces is a difficult process for both beginner and expert typographers. Existing workflows require thedesigner to create every glyph, while adhering to many loosely defined design suggestions to achieve an aesthetically appealingand coherent character set. This process can be significantly simplified by exploiting the similar structure character glyphs presentacross different fonts and the shared stylistic elements within the same font. To capture these correlations, we propose learninga stroke-based font representation from a collection of existing typefaces. To enable this, we develop a stroke-based geometricmodel for glyphs, a fitting procedure to reparametrize arbitrary fonts to our representation. We demonstrate the effectivenessof our model through a manifold learning technique that estimates a low-dimensional font space. Our representation capturesa wide range of everyday fonts with topological variations and naturally handles discrete and continuous variations, such aspresence and absence of stylistic elements as well as slants and weights. We show that our learned representation can be usedfor iteratively improving fit quality, as well as exploratory style applications such as completing a font from a subset of observedglyphs, interpolating or adding and removing stylistic elements in existing fonts.

Keywords: typography, curves & surfaces

ACM CCS: Computing methodologies�Shape analysis

1. Introduction

Font collections are ubiquitous in design tools ranging from wordprocessors to graphics editors. Although professional typographersperpetually create new and diverse families of fonts, it is not alwayspossible to find a font with desired visual features, and thus designersoften have to compromise by selecting a font that does not fit withtheir graphic message.

Designing glyphs that have desired style, aesthetic appeal, co-herence and follow good typographic practices requires substantialexpertise and time investment. Even a small edit to one glyph (e.g.make the base of the ‘t’ a little wider) can require subtle updatesto many other glyphs to maintain coherence within the font. Sincetools are not available to support these edits, font customization isgenerally inaccessible to all but a few expert font designers.

The goal of our work is to describe a part-aware font represen-tation and show how it can be integrated with machine learningtechniques to automate font manipulation (see Figure 1). Thedescribed system is designed as a first step to assist non-expertsin the design of customized fonts, it is not intended to competewith carefully designed high-quality fonts. We observe that priors on

glyph structure and stylistic consistency are implicitly en-coded in existing typefaces. We therefore hope to capture thisknowledge in a generative model and use it to facilitate font editing,completion, interpolation and retrieval.

The first challenge in learning from existing typefaces is that theydo not share a common structure. Glyphs are typically representedby closed outline curves composed from primitives such as linesand Bezier curves with no consistency across different fonts evenfor the same character.

One approach to achieve a consistent representation would be torasterize the glyph to a fixed-size image [USB16], however, thisrepresentation loses the advantages of vector graphics and does notallow preserving sharp features of the original glyph geometry.

Another option is to represent each glyph’s outline with a poly-line such that endpoints of each line segment are in correspondenceacross all glyphs [CK14]. Outline-based representation makes itdifficult to reason about decorative elements such as serifs that arepresent only in a subset of glyphs. It also ignores the stroke-basedstructure of characters which is often consistent across glyphs and

c© 2018 The AuthorsComputer Graphics Forum c© 2018 The Eurographics Association andJohn Wiley & Sons Ltd. Published by John Wiley & Sons Ltd.

1

Page 2: Learning A Stroke‐Based Representation for Fontsric fonts with constraints, and Hu and Hersch [HH01] provided a richer set of geometric components to define a parametric typeface

2 E. Balashova et al. / Learning A Stroke-Based Representation for Fonts

Manifold Learning:Consistency:Examples:Retrieval:

Completion:

Interpolation:

Figure 1: We learn style variations from existing typeface collections by representing them using a consistent parametrization, and projectingonto a low-dimensional manifold. We start with a set of glyph examples that are not consistently parametrized, and use our fitting method toproduce a part-based template parametrization consistent across same glyphs. We project the parametrization features into a low-dimensionalmanifold and thus produce a missing-data-aware generative model. The resulting manifold can be used for exploratory applications tounderstand collections of fonts: topology-aware font retrieval, completion and interpolation.

can be used to facilitate analysis. Phan et al. [PFC15] proposeda stroke-based glyph representation, however, this method still re-quires significant human interaction during the learning step to con-vert given outlines to the proposed representation, which hindersnon-expert usage.

In this work, we represent a glyph as a set of strokes and outlinesegments, where the latter are defined with respect to the coordinatesystem of their respective stroke. Many fonts are constructed withthe appearance of brush strokes, to varying degrees; we believe theyare better modelled by a spine and profile curves. A better modelmeans more appealing interpolation and synthesis of charactersas compared to raster or polyline representations. Consistency instroke structure and low variance in profile offsets are among thekey ingredients in our parametrization which make it easier to learncorrelations between different parameters. This representation canalso naturally handle discrete topological variations (i.e. additionand removal of strokes, such as serifs).

We take advantage of consistency in stroke structures of let-ters to create a common template for each letter. We propose anoptimization procedure to align the template to the input outline,enabling reparametrizing of any input glyph consistently with theother glyphs of the same letter. Unlike previous work on outlinealignment [CK14], our method does not require joint analysis of theentire data set, which makes our method scale linearly with respectto the number of fonts and allows us to process a collection of fontsan order of magnitude larger.

We formulate our representation as a concatenation of glyph de-scriptors for the entire font. Since there are codependencies in theconcatenated glyph descriptors, due to stylistic similarities withina font and global structural similarities across fonts, it is possi-ble to learn a reduced-dimensionality representation that capturesthe salient shape variations within a font collection. However, dataare not always available for all glyphs (if only certain letters havebeen created by a designer for some font, or if the template fittingalgorithm produces a poor fit). Therefore, we employ an Expecta-tion Maximization Principal Component Analysis (EM-PCA) algo-rithm [Row98] that is robust to missing data to learn the reduced-dimensionality representation. This model is a linear form of the

more general Variational Autoencoder (VAE)[KW13] (we also ex-periment with non-linear VAEs, and find that the linear EM-PCAmodel works better for our cases). The result is a projection to andfrom a latent font space that can be used for manipulation, interpo-lation and generation of fonts.

We use a data set of publicly available fonts to benchmark thequality of fitting our stroke-based representation to unstructuredoutlines and evaluate the reconstruction quality from the learnedmanifold. We demonstrate that we can consistently parametrizemany existing fonts with our template, and use this parametrizationto learn a common representation for fonts. We leverage our learnedmodel for topology-aware font retrieval and for completing missingglyphs from a subset of outlines.

2. Related Work

Located at the intersection of graphics and information design, fontdesign has been a well-studied field for decades [Zap87, Car95,Tsc98, Tra03, OST14, as8]. Our work is therefore informed by richprior research on representation, analysis and synthesis of fonts andshape collections. The most influential and relevant previous workis described below.

Font Representations Font glyphs are examples of complex 2Dshapes that convey both meaning and appearance. There are gen-erally three common ways to represent fonts: bitmaps, strokes andoutlines. Bitmap representations are easy to use and compatiblewith computer code, but do not scale without introducing artefacts.Stroke representations capture glyph essence well and are gener-ally more compact [Gon98] (which is extremely advantageous tofonts with thousands of glyphs such as Chinese, Japanese, Korean(CJK) fonts [LZX16]), but are generally less expressive [JPF06].Outline-based representations are more expressive [JPF06], but re-quire hinting to counteract rendering artefacts [hin, HB91, Her94].Still, outline fonts are a prevalent for representing Latin script type-faces in popular scalable computer formats [Mic16].

Typically, customizing a font requires tedious outline orstroke-editing, which is addressed by parametric font representa-tions [Knu86]. In industry, a popular approach is to use master

c© 2018 The AuthorsComputer Graphics Forum c© 2018 The Eurographics Association and John Wiley & Sons Ltd.

Page 3: Learning A Stroke‐Based Representation for Fontsric fonts with constraints, and Hu and Hersch [HH01] provided a richer set of geometric components to define a parametric typeface

E. Balashova et al. / Learning A Stroke-Based Representation for Fonts 3

fonts, which enable high-level parametric control by interpolationwith consistently placed control points [Ado97]. A recent Open-Type format [Mic16] allows more elaborate parametrization duringfont design, for example, by continuously varying width or weight.The main limitation of these techniques is that they require type-face designers to provide a parametrization of the font. This mightbe possible within a small family of related fonts (i.e. a typeface),but these parametrizations are still inconsistent across typefaces. Incontrast, we aim to achieve similar expressiveness while imposingminimal effort on typographers, simply by analysing existing fonts.

To simplify the process of creating parametrized fonts, severalsystems have been proposed that prescribe a mapping between a fewhigh-level font parameters and the control points of every glyph. Forexample, the Metafont system [Knu86] was used to create ComputerModern typeface families governed by a small set of parameters.These control continuous attributes, such as width and height ofglyphs, as well as discrete variations, such as presence of serifs.Hersch and Betrisey [HB91] described a set of rules to fit a topo-logical model to outlines to enable automatic hinting. Shamir andRappoport [SR98] proposed a visual tool for designing paramet-ric fonts with constraints, and Hu and Hersch [HH01] provided aricher set of geometric components to define a parametric typeface.Shamir and Rappoport [SR99] describe a procedure to representoutline-based oriental fonts using a hand-designed parametric fontmodel and a procedure to compact the resulting representation us-ing quantization. This work is different from our method in severalways: first, their parametrization requires manual assistance; sec-ond, their system fits deformable primitives to glyphs and does notaim for a consistent primitive set; it cannot be used for learning amanifold of fonts and for reasoning about part relationships withineach glyph.

Learning Font Representation The first step in learning a fontrepresentation is establishing consistency across glyphs. One sim-ple approach is to rasterize a glyph to a fixed-size image. This iseffective for recognition [WYJ*15], but poses challenges for fontsynthesis. Current image synthesis methods yield blurry results,omit small features and fail to enforce shape and curve continuity inthe resolutions required for most fonts [USB16] (see Figure 11b).

Early work in this direction [Ada89] proposed decomposingglyphs into parts and deriving rules that share design attributesacross glyphs to encourage consistency. In this system, all controlglyphs have to be prescribed to create the complete font, and all re-lationships between prescribed and synthesized glyphs are definedmanually. Campbell and Kautz [CK14] introduce a vector represen-tation for learning font shapes, and learn a manifold of fonts. Theirmethod is based on correspondences between glyph outlines. Theyjointly optimize for all control points by sliding them along the inputoutline while matching geometric features (e.g. normals and curva-ture) and preserving the original spacing (i.e. arclength via elasticityobjective). This approach does not explicitly recognize parts, andhence causes distortions in features like serifs when interpolatingserif and sans-serif fonts. In addition, this approach does not lenditself to learning the common shapes in different glyphs. For exam-ple, the upper loop of ‘g’ and ‘g’ are very similar, even though theoverall outline is not. Similarly, decorative elements, such as serifs,need to be consistent in style, and together influence attributes of the

glyph, such as its width. Please see [Ada86, Ada89] for extendeddiscussion of common aesthetic principles. We allow glyphs withdifferent topological structure to share a subset of strokes. This en-ables us to learn, for example, the shape of the upper loop in ‘g’regardless of the shape of the bottom, and learn the shape of serifsonly from glyphs that have them. The resulting stroke-based tem-plate with potential topological variations can be matched to eachglyph independently which enables us to avoid joint analysis andprocess an order of magnitude more fonts.

Phan et al. [PFC15] also propose to leverage stroke-based rep-resentation, modelling a glyph with brushes and caps aligned tocanonical strokes. However, their method requires manual labellingof glyphs, limiting its applicability. It also represents each glyph asa set of stroke curves with profile offsets, a representation whichsuffers from artefacts in high curvature regions. Similarly, Lianet al. [LZX16] relies on a stroke/profile representation to generatehand-written fonts. Their proposed fitting procedure only focuses onstroke fitting and is very similar to our initialization [MSCP*07], butour contribution lies in optimization for Bezier control points whichcomes after stroke initialization. Finally, their system assumes thateach type of stroke appears at least once in the input set and that theuser provides at least several hundred characters, while our systemdoes not have these requirements. Suveeranont and Igarashi [SI10]propose a weighted blend of outlines and skeletons to model newfonts as a linear combination of existing fonts. As they observe intheir paper, linearly combining good fonts does not always yield aplausible result, since it does not capture codependencies betweendifferent parameters. Their solution is to limit the synthesis to asmall set of the most similar examples, requiring a dense samplingof font space.

Analysing 3D Shape Collections Our method is also related toexisting work on analysing 3D shape collections. In that domain,boundary-based consistent parametrization techniques [PSS01]were initially used to analyse similar surfaces with little topolog-ical variations. Part-based templates have been developed to han-dle more complex shapes [KCKK12, KLM*13], which enables abetter analysis of topological variations [AXZ*15]. As with fontanalysis, they provide means to analyse relationships between ob-jects that share only a subset of parts, and provide effective priorson global structure. Recently, several techniques have been pro-posed to analyse stylistic compatibility between objects [LKS15,LHLF15], aiming to separate content (i.e. object class) and style.Lun et al. [LKWS16] also propose a non-parametric method fortransferring style by curve-based deformation, addition and removalof parts. These approaches can be viewed as analogues to the prob-lem of synthesizing fonts by transferring style from designer-createdglyphs. The main advantage of our setting is that we can leverage avast amount of data where the same content (i.e. letters) is presentedin various coherent styles (i.e. each font is stylistically coherent). Ofcourse, we provide a parametric model for glyph geometry, whichis less challenging than arbitrary 3D shapes.

3. Overview

Typographers invest significant effort into designing glyphs thatmaintain a consistent style and adhere to good design principles.

c© 2018 The AuthorsComputer Graphics Forum c© 2018 The Eurographics Association and John Wiley & Sons Ltd.

Page 4: Learning A Stroke‐Based Representation for Fontsric fonts with constraints, and Hu and Hersch [HH01] provided a richer set of geometric components to define a parametric typeface

4 E. Balashova et al. / Learning A Stroke-Based Representation for Fonts

sa

oa+

oa-

Cap

θo θt(1,1,1,0,1,0)θs

oa+ oaoa

+...oa

c -

SkeletonOutline

Connectivity Constraints

Other topologysa,b

oa+oa

c oa-

Figure 2: Deformable template representations. An example forthe letter ‘a’. Our model is described by parameters of strokes (θs),outlines (θo) and caps (left), as well as topological parameters (θt),indicating which strokes need to be present, and outline connec-tivity constraints (middle). Multiple topologies are prescribed forthe same letter (middle, bottom). In each topology, one base skele-ton segment (sa,b) is represented in image space coordinates, whileall other segments are offsets from it, or from other segments in atree-like manner (right).

Thus, a well-designed font encompasses many implicit structuraland stylistic relationships between the glyphs or their parts. The goalof this work is to learn these relationships from existing fonts and usethem to simplify the design process of new ones. In particular, givena collection of fonts, we learn a low-dimensional representation thatcan be used to complete partial designs, interpolate between existingfonts and provide easy controls to add and remove different stylisticelements, such as serifs.

To achieve this goal, a low-dimensional manifold needs to belearned across a variety of fonts. Since it is challenging to performmanifold learning on a general (inconsistent) font representation,we propose a two-step iterative process (Iterative Manifold Evolu-tion, Section 7) to overcome these challenges: we parametrize fontsconsistently by fitting the font model to all glyphs, and then weupdate the model using the newly parametrized fonts. This two-stepprocess iterates until convergence to learn a font representation forall glyphs in a lower dimensional space.

We propose a stroke-based model that is concise, expressive andis semantically driven—enabling us to factorize a glyph shape intomeaningful elements. In particular, we represent each glyph witha set of strokes, where each stroke consists of a skeleton curve,two outline curves on both sides of the skeleton and a single capcurve for every endpoint. We represent all curves with Bezier con-trol points. The skeleton control points of one skeleton segment aredefined in the global coordinate system, while the other parts, theoutlines and the cap are defined relative to that base curve or otherskeleton segments in a tree-like manner (see Figure 2 right). Thisnaturally separates profile features (e.g. width) from stroke features(e.g. slant), which facilitates learning relationships; outline controlpoints will have a larger offset from the skeleton in a bold fontand skeleton strokes will have a slope in an italic font. The stroke-based representation also naturally handles topological variations inglyphs. For instance, the glyphs ‘g’ and ‘g’ can share parametersfor the control points of the upper loop, but have different strokesto represent the tail. Similarly, decorative elements, such as serifs,can be represented with optional strokes. Thus, our consistent rep-

resentation is defined with respect to a set of strokes, which we calla deformable glyph template, where template parameters includetopological (which strokes are included) and geometric (controlpoints of Bezier curves) ones (see Section 4).

To represent existing font outlines with our stroke-based modelwe propose a fitting procedure of the deformable template. Ourfitting process takes advantage of the fact that stroke skeletons donot vary significantly across fonts, since letters are recognized byspecific stroke structures. Thus we start by fitting the strokes of thetemplate to the input glyph, followed by outline fitting, deformingthe template Bezier curves to align with the boundaries of the input(see Section 5). Our template fitting process eliminates the needfor joint analysis of all fonts, and is trivially parallelizable, whichenables us to process 570 fonts, an order of magnitude more thanprevious techniques (such as Campbell and Kautz [CK14]). We fo-cus our analysis on lowercase letters, as prior work highlighted thattheir design is more differentiated [Ada89], however, the system canbe extended to uppercase letters and other characters by generatingmore templates.

To represent the entire font we concatenate individual glyphrepresentations across all letters, generating a vector in a high-dimensional feature space. Since glyphs share a lot of stylistic andstructural attributes, we expect this representation to be overcom-plete, and learn a lower-dimensional font representation. For anygiven font, there will be missing entries: the feature vector includ-ing separate entries for different forms of ‘g’ and ‘g’ (only one ofwhich will appear in any given font), and serif parameters that donot occur for sans-serif fonts. Thus, we will not have the entire fea-ture vector during training or testing time. In addition, at test time,reasoning about the font representation by observing only a subsetof the glyphs can give rise to many applications, such as font com-pletion (see Section 9). We chose the EM-PCA approach [Row98],which provides a simple linear map for dimensionality reduction,and generalizes PCA to allow for missing feature values at test andtraining time (see Section 6).

4. Stroke-Based Deformable Template

Description We use a parametric deformable template Tc(�) torepresent each letter c in the set of glyphs, where � are the defor-mation parameters that define the shape of the glyph generated byTc(�). Our template is described by a set of skeleton segments Sc andoutline segments Oc, as shown in Figure 2 for example case whenc = ‘a’. A single skeleton segment, named the Base-Segment sc,b, isrepresented by the curve location in image space. The other skeletonsegments are represented as offsets from sc,b or from another seg-ment in a tree manner. Furthermore, every stroke corresponds to oneskeleton segment s ∈ Sc, and each such segment corresponds to twooutline segments on opposite sides of it: {o+

s , o−s } ∈ Oc. We repre-

sent a cap as a special case skeleton stroke that has only one outline(oc

s ) associated with it. This dependency structure ensures consistentpart and outline positioning during the fitting process, without theneed to locally (or globally) align parts across different fonts. Notethat one could also encode the template using angle relationshipsbetween segments, however we found positional differences to bemore robust (as a slanted f still consists of horizontal wings), andproduced better results for the learning step (see Section 6). Lastly,

c© 2018 The AuthorsComputer Graphics Forum c© 2018 The Eurographics Association and John Wiley & Sons Ltd.

Page 5: Learning A Stroke‐Based Representation for Fontsric fonts with constraints, and Hu and Hersch [HH01] provided a richer set of geometric components to define a parametric typeface

E. Balashova et al. / Learning A Stroke-Based Representation for Fonts 5

Figure 3: For each letter template, we provide a set of examplelayouts. Some depict topological variation (e.g. for the letter ‘a’,Figure 2). Some distinguish between decorative and sans-serif ex-amples (e.g. ‘u’, left). Some templates include only one example(e.g. ‘e’, middle). These examples adhere to the connectivity andsmoothness constraints defined by the template, for example thestrokes of ‘c’ are constrained to be C1 continuous (right). See thesupplemental material for all template examples.

the template also prescribes the ordering of the outline segments onthe generated outline and their connectivity constraints, as depictedin Figure 2 (middle). The connectivity constraints ensure the wealways reconstruct a continuous outline, and practically remove re-dundancies from our representation. When a stroke is missing (e.g.a serif stroke in a sans-serif font), it creates a gap in the outline,which is trivially closed by connecting the two adjacent segmentsbased on the ordering. In addition, Figure 2 (left) also demonstratesan example skeleton, its corresponding profile curve and caps.

All curves in our framework are cubic Bezier, parametrized bytheir control points. Most skeletons and outlines are represented bya single curve, but long or more complex strokes are representedby two (which are enforced to be C1 at the seam). The connectivityand smoothness constraints render some of the four Bezier controlpoints redundant and thus they are removed from the representation.We denote the control points coordinates by θs ∈ �s for skeletonstrokes and θo ∈ �o for outline curves. The control points of theoutline curve are defined relative to the skeleton points, and skeletonpoints are defined relative to points on sc,b or other segments in a treestructure, as depicted by arrows in Figure 2 (left). This coordinateframe directly relates to meaningful properties such as brush widthprofile, and so is the natural choice for outline representation. Inaddition, our template deformation parameters also allow discretetopological changes, which we denote by a vector of binary values,�t. Each value corresponds to presence or absence of a single stroke.Thus, the shape of a deformed template is defined by the vector � =(�s, �o, �t). To facilitate fitting, we also provide a set of exampletopologies and geometries for each template, as demonstrated inFigure 2 (right bottom) and Figure 3.

Number of Templates For each character, we generally have twotemplates—for the serif and the sans-serif versions. Some charac-ters (such as ‘k’) present significant variations in topology betweenfonts and hence warrant another template in order to be capturedwell, while others were successfully represented using one templateonly (such as ‘o’). A general rule is to capture sufficient glyph vari-ation to be used by the learning algorithm (see Section 8.1). Thetemplate generation process involves annotating junctions and partconnectivity. We provide precise templates for every letter in thesupplemental material.

5. Template Fitting

We now describe the template fitting procedure that finds the opti-mal deformation parameters for an input glyph Gc,f of the letter c

from a font f . Our fitting process is an optimization procedure inwhich we optimize for the parameters � such that the outlines ofGc,f and Tc(�) become as similar as possible, in a consistent man-ner. The template Tc provides structure and geometric model forthe glyph as well as an example for every possible topology of theglyph with approximate stroke shape: Ec = {�c,i

t , �c,is : i = 1..Kc}

(see Figure 3 for examples and all details in supplemental mate-rial). A demonstration of our process’ consistency can be seen inFigure 4. In the following, we describe what are the considerationswe account for when searching for a good fit (see Section 5.1),describe the iterative optimization procedure (see Section 5.2) andprovide an initialization procedure for the template deformationprocess (see Section 5.3). Note that this fitting procedure is furtherrefined through iteratively leveraging and evolving a manifold ofthe learned fonts, as described in Section 7.

5.1. Fitting quality

In the following, we describe the objective function that evaluates thequality of a template fit. Our function favours shape correspondencebetween the input and the deformed template, alignment of featurepoints (sharp turns and corners) and avoids undesired local minimathrough regularization.

5.1.1. Correspondence

First, we penalize dissimilarities between the deformed templateand the glyph:

Ecorr(Gc,f , �) = ∑x∈P (Gc,f )

hσcorr (D(x,Oc(�)))

+ ∑y∈P (Oc(�))

hσcorr (D(y,Gc,f )), (1)

where Oc(�) is the outline generated by the current configuration(�) of the deformable template, P (�) is a dense sampling of acurve � and D measures point-to-curve distance. In practice, wesample 100 points per outline segment and use a KD-tree for nearest-neighbour queries. We use a Gaussian kernel hσ (x) = 1 − e( x

σ )2of

size σcorr to reduce the influence of outliers and to better controlthe optimization behaviour. The effects of this term is depicted inFigure 6 (bottom left).

5.1.2. Normal consistency regularization

Measuring the overall bidirectional distance between the two curves,in Euclidean space, can attract incompatible regions, sometimesas far as from the opposite side of the outline. Thus, we rem-edy this issue by favouring similar normal directions of the de-formed template and the target outline it is projected onto. Inparticular, we augment our point representation with normal di-rections, lifting the 2D point p = [px, py] on the target outlineto 4D space: [px, py, wnnx(p), wnny(p)], where wn is a con-stant weight, and n(p) is the normal at point p. Specifically, forour deforming template outline, we lift each point q to become

c© 2018 The AuthorsComputer Graphics Forum c© 2018 The Eurographics Association and John Wiley & Sons Ltd.

Page 6: Learning A Stroke‐Based Representation for Fontsric fonts with constraints, and Hu and Hersch [HH01] provided a richer set of geometric components to define a parametric typeface

6 E. Balashova et al. / Learning A Stroke-Based Representation for Fonts

Figure 4: Consistency evaluation. Our template fitting process is focused on consistent semantic annotations of the input glyphs. Thegeometric and topological variety of three letters are depicted in each row, along with the optimization’s successful capturing of the correctand consistent stroke arrangement for them.

[qx, qy, wnnx(q), wnny(q)], and do the same for the target glyphGc,f . With this augmented representation, we estimate the point-to-curve correspondence, and evaluate Ecorr according to the 2D

Euclidean distance of each point on P (Oc(�)) from the one mostrelevant to it on the glyph in this combined position-and-normalspace. Matching and regularizing normal directions is especiallyuseful for avoiding local minima at outline corners, as it helps twosides of the outline to roughly preserve their original orientationswhile aligning to the input, instead of collapsing into one of thesides. This notion also further facilitates consistency, as it encour-ages the outlines to be similar to the skeletons, which are moresimilar to one another over different fonts.

5.1.3. Feature points alignment

Feature points, i.e. points of a large change in outline directions, areimportant for glyph appearance and provide a useful cue in the fittingprocess. Thus, aligning these points is essential in constructing aconsistent database. We denote as F (Gc,f ) the set of points whichare the centres of such large direction changes. That is, if the curvechanges its direction by an angle greater than π

6 over a short distance,we mark the middle of this region as a feature point. Examiningregions instead of pointwise changes in direction induces detectingboth sharp and round corners.

To define these points on the template, we observe that sharp turnsare more likely to occur at the boundary of two different strokes, thuswe only consider junctions of outline segments as template featurepoints and denote them as J (Oc(�)). This ensures that a specificoutline segment does not undergo unnecessary turns in some glyphscompared to others, as demonstrated in Figure 6 (bottom right).To avoid foldovers in cases when feature points in G are close toone another in Euclidean space, but far-away along the curve, wemeasure the distances intrinsically rather than extrinsically for thisterm. Specifically, we define an intrinsic distance between a pointx ∈ J (Oc(�)) and feature point in F (Gc,f ) as

Dcurve

(x, F (Gc,f )

) = miny∈F (Gc,f )

Arclength(Proj(x, Gc,f ), y

), (2)

where x is first projected to the glyph outline to be able to measurearclength:

Proj(x,Gc,f ) = arg miny∈Gc,f

D(x, y). (3)

Since we do not expect J (Oc(�)) to be in perfect correspon-dence with the set F (Gc,f ) we only match a subset of junc-tion points that have a feature point nearby (i.e.Dcurve < τft, τft =0.15 · average segment length). Denoting this set as Fτft (Oc(�)) ⊂J (Oc(�)), we can quantify feature alignment as

Eft(Gc,f ,�) =∑

x∈F (Oc(�))

Dcurve(x, Fτft (Gc,f )). (4)

5.2. Optimization

Starting with an initial template deformation �0 we use gradient de-scent optimization to find the optimal deformation parameters. Wefound that directly optimizing the entire objective E = Ecorr + Eft

is prone to local minima issues and slow convergence. Thus, weopted for a procedure that alternates between optimizing Ecorr andEft. We rely on correspondence matching in early gradient descentiterations and slowly increase the influence of feature-point com-patibility term. This is controlled via rft parameter, the ratio ofdistance the feature points are allowed to be moved relative to theirprojected targets, which increases with every iteration. Similarly,we emphasize the weight of the normal matching with every iter-ation by modifying parameter wn. In particular, the optimizationparameters start with σcorr = 20, wn = 25, rft = 0.3 and end withσcorr = 5, wn = 0.01, rft = 1. We optimize for Niter = 70 iterationsand update the weights at every iteration by interpolating them lin-early with respect to iteration count. Figure 6 demonstrates an initialguess and optimized fitting result with some terms of the objectivefunction omitted.

5.3. Initializing template deformation

Starting with initial example strokes in Ec we now obtain initialguess for our fitting procedure, �0. See Figure 3 for an example ofa stroke structure in Ec. These approximate examples lack outlinegeometry and often are too dissimilar from input glyphs to serve asour initial guess for the outline optimization. Instead, we start byextracting a skeleton from an input glyph and fitting every examplestroke structure to it. These stroke structures exhibit far less variancethan the full outline, which naturally simplifies this step. Once wefit the stroke, it is simple to estimate a uniform width initial outlinesaround the strokes. We then pick the parameters that yield the bestenergy to define the initial state for the outline optimization, �0.

c© 2018 The AuthorsComputer Graphics Forum c© 2018 The Eurographics Association and John Wiley & Sons Ltd.

Page 7: Learning A Stroke‐Based Representation for Fontsric fonts with constraints, and Hu and Hersch [HH01] provided a richer set of geometric components to define a parametric typeface

E. Balashova et al. / Learning A Stroke-Based Representation for Fonts 7

(a) (b) (c) (d)

Figure 5: We illustrate our skeleton fitting pipeline. Given an inputglyph, we first extract its skeleton (a). We then register this skeletonto a template (b) to obtain an initial segmentation (c). We then useCRF-based segmentation to obtain the final consistent segmentationof the skeleton with respect to the template (d).

5.3.1. Stroke initialization, �0s

To fit the stroke parameters to an input glyph Gc,f , we first estimateits skeleton Sc,f . We use a method based on shape diameter func-tion [SSCO08] to extract the major stroke skeletons defined by theoutline. We then connect these regions by tracing a shortest pathbetween them via medial axis points, forming a connected graph ofskeleton paths, where most vertices have degree 2. This procedureprovides a single connected curve network for the skeleton, whichunlike the medial axis, captures only the main strokes, reducingsensitivity to small outline perturbations. See Figure 5(a) for anexample output of this procedure.

Our next step is to fit the template strokes to the extracted skeleton.We found that directly deforming the template skeletons is proneto failure, so instead we segment and parametrize the extractedskeleton consistently with the template. Since strokes are continuousand smooth curves, and their junctions typically occur at sharpskeleton features, or at intersections with other strokes (or skeletoncurves). We formulate the segmentation of the stroke according tothe template through a CRF-based objective, where the nodes areskeleton points, the unary term is defined by registration of theskeleton to the template and the pairwise term favours cuts at sharpfeatures and intersections. In particular, we optimize

E =∑x∈V

Wu · U (x, L(x)) +∑

x,y∈V,L(x)�=L(y)

Wb · B(x, y), (5)

where the unary U (x, l) term is a penalty for assigning a point x alabel l and the binary term B(x, y) is a penalty for a pair of adjacentskeleton points to have different labels. These are formulated as

U (x, l) = hσu(Dl(x)) , (6)

B(x, y) ={

1 − hσa(A (x, y)) , (x, y) ∈ E ∧ x /∈ J,

0, otherwise.(7)

To define the unary term, we use Dl(x), a distance between thetemplate and the extracted skeleton after it is registered onto thetemplate skeleton using Coherent Point Drift method [MSCP*07],where the distance is measured to the lth skeleton segment. Todefine the binary term we use the angle between the normals ofadjacent points A(x, y), where E denotes adjacency and J denotesjunctions—points where more than two skeleton paths meet. Bothterms are smoothed with the aforementioned Gaussian kernel hsigma,and we set σu = 0.25, σa = √

3, Wu = 1, Wb = 4.

Input Initial Width Initial Guess

Only Ecorr With Normal Reg With FeatureAlignment

Figure 6: Outline optimization. Given an input glyph and its cal-culated skeleton (top left), we use its segmented version (see Sec-tion 5.3.1) to generate uniform width outlines around it (top mid-dle). Our initial guess Oc(�0) is the latter, after incorporating thetemplate’s connectivity constraints (top right). Our fitting proceduretakes three considerations into account. Optimizing using only Ecorr

(see Section 5.1.1) can attract incompatible regions, as can be seenin the red box (bottom left). Considering normal directions (seeSection 5.1.2) mitigates this issue (bottom middle). Finally, incor-porating feature point alignment (see Section 5.1.3) is crucial for asemantically meaningful fitting, inducing consistent representationacross different fonts. This is highlighted in the red circles (bottomright).

After the skeleton is consistently segmented with the template,we parametrize every segment according to the template Tc(θ ), i.e.we fit a Bezier curve to match each stroke sc,f . Once the strokes ofthe glyph are fitted, we turn to initialize the outlines next.

5.3.2. Outline initialization, �0o

We represent the outline in the coordinate system of the extractedskeleton segment (see Figure 2). To create an initial outline estimate,we first generate a profile curve at a constant distance that fitsclosely to the input outline for every stroke (i.e. such that the averagedistance between the curve and the closest point on the outline isminimized) (see Figure 6). The resulting outline is typically notcontinuous since the curve parameters are estimated independentlyfor each stroke. We enforce C0 continuity at junctions of outlinesegments by snapping the corresponding points to their averageposition. Sometimes this averaged control point can flip sides ofthe skeleton, which leads to inferior optimization performances. Wedetect these cases and project the average control point back to thecorrect side of the corresponding skeleton segment.

6. Learning

Given a collection of fonts F = {F1, F2, . . . , FN } we use our glyphfitting techniques to represent all letter glyphs consistently. We con-catenate per-glyph parameters to create a feature vector representinga font: Fi = [�‘a’, �‘b’, . . . �‘z’]. In this representation, each font

c© 2018 The AuthorsComputer Graphics Forum c© 2018 The Eurographics Association and John Wiley & Sons Ltd.

Page 8: Learning A Stroke‐Based Representation for Fontsric fonts with constraints, and Hu and Hersch [HH01] provided a richer set of geometric components to define a parametric typeface

8 E. Balashova et al. / Learning A Stroke-Based Representation for Fonts

becomes a point in high-dimensional feature space: Fi ∈ RDfont ,

where Dfont = 3998 (the number of parameters required to repre-sent all glyph parts in a font, after enforcement of continuity andtemplate part-sharing constraints). We expect this representation tobe redundant due to stylistic and structural similarities in fonts:glyphs in the same font will share stylistic elements, and glyphsthat correspond to the same letter will have similar stroke structure.We propose to learn these relationships by projecting all fonts toa lower-dimensional embedding. The main motivation behind thisis to create a space where important correlations between differentattributes are captured implicitly, and thus, any sample point X inthis representation will implicitly adhere to common design prin-ciples and font structures in the training data F . For this, we needan embedding function M that projects a font to a low-dimensionalfeature space, i.e. M(F ) = X, where X ∈ R

D , and an inverse func-tion C that can reconstruct a font from the embedding C(X) = F .To enforce learning correlations in the data, we set the latent dimen-sionality D = αDfont = 39 with α = 0.01, which was determinedexperimentally to work best for our setup.

In addition to capturing important correlations, we want the em-bedding function to handle missing entries in the input feature vec-tors Fi because most fonts do not have glyphs with all possibletopologies (i.e. decorative elements and stroke structures). In addi-tion, in many applications in order to help the typeface designer weneed to be able to reason about the whole font before it has beencompleted. To satisfy this requirement, we pick a linear embedding,with a slight abuse of notation: M ∈ R

Dfont×D , and compute it fromdata points F via EM-PCA method [Row98]. This generalizationof PCA easily scales to large data sets and can handle missing dataat train and test time. This method alternates between the E-stepthat optimizes for X = [X1, . . . , XN ], embedding coordinates of allfonts F , and the M-step that optimizes for linear maps M,C. Inthis case, C spans the space of first D principal components, and wecan compute M = (CT C)−1CT . To find an optimal linear map, theoptimization starts with a random matrix M . In the E-step, we setX = MF , however, in case with missing data, each Xi and Fi areoptimized for to minimize the norm: ‖CXi − Fi‖, where missingentries of Fi and entire vector Xi are free variables, this problem isthen solved to find a least squares solution. In the M-step, we com-pute Cnew = FX T (XX T )−1. This process then iterates between theE-step and M-step K times to estimate the matrix M and all pro-jections X (in all our experiments K = 100).

While some existing non-linear manifold learning algorithmsalso have variants that address missing data issues (e.g. GP-LVM [NFC07] and autoencoders [SKG*05]), we perform an empir-ical comparison and show that a simpler linear model works betterwith our data set (see Section 8.2 for discussion).

7. Iterative Manifold Evolution

The initial manifold is learned by starting the fitting optimizationusing skeleton-based initialization (see Section 5.3.2), measuringthe fitting error (see Section 8.1), and retaining only those fits witherr(�) < T to estimate a manifold. Given this manifold, we projectthe font to the manifold and reconstruct it: M−1(M(Fi)). We thenuse the resulting parameters to initialize the optimization procedurefor every glyph with err(�) > T (i.e. completing the style of glyphs

Initial Fit Improved FitManifold Guess

Figure 7: Sample fit improvements using iterative manifold evolu-tion. Starting with a manifold-generated initial guess reduces thefinal fitting significantly in many cases.

with high error). For example, in Figure 7, we show an example ofa failed fit from a skeleton-based initial guess (template fitting com-monly fails because the skeleton-based initialization for templateparameters is too far from global minima), and the improved guessafter manifold reprojection.

We iteratively estimate the manifold and reproject remaininghigh-error glyphs in each iteration. As can be seen in Figure 8,the overall fitting error decreases, and fraction of successful fits in-creases. All results presented in this paper correspond to the fifthiteration of this procedure.

8. Evaluation

8.1. Fitting evaluation

We test our method on 570 fonts obtained from onlinedatabase [Fon17]. Although more fonts are available, we restrictedourselves to normal monospace, sans-serif, serif and slab seriffonts, which still provides a data set that is an order of magni-tude larger and more diverse than what has been used in previouswork. We fit our stroke-based representation to every glyph in thisfont collection, providing a consistent representation across all fonts.Figure 4 shows some example fits that result from the iterative fittingprocedure.

To quantitatively evaluate the expressiveness of our deformabletemplate and accuracy of our fitting procedure, we compare theoutline generated by our template and the outline of the input glyph.To compare two curves, we densely sample each curve so that thedistance between consecutive points is 0.3 pixels (each glyph isscaled and represented to resolution of 200 × 200 pixels), and thenfor each sample we compute the distance to the closest point on theother curve. We then average these distances, in a root-mean-square

c© 2018 The AuthorsComputer Graphics Forum c© 2018 The Eurographics Association and John Wiley & Sons Ltd.

Page 9: Learning A Stroke‐Based Representation for Fontsric fonts with constraints, and Hu and Hersch [HH01] provided a richer set of geometric components to define a parametric typeface

E. Balashova et al. / Learning A Stroke-Based Representation for Fonts 9

Figure 8: Iterative manifold evolution gradually increases the per-centage of fits accepted for learning and reduces the mean fittingerror.

Figure 9: Fitting results by glyph. For all glyphs except for ‘g’,over 70–80% can be fitted well enough to be used for learning. ‘g’is the hardest case to fit due to high variation in both topology andshape.

manner. We report these results in Figure 9, where the x-axis isan average fitting error threshold, and the y-axis is the fraction ofglyphs that their average error is below the threshold. We observethat errors vary depending on glyphs, for example, ‘g’ poses the mostchallenges due to significance variance in its topology (e.g. the lowertail can be attached at various places on the upper loop and mighthave very diverse shapes). Figure 10 provides some representativevisualizations of glyph appearance under different fitting errors.

To learn a manifold of fonts, we pick a conservative threshold ofT = 1.0 pixels average error to ensure that we learn only meaningfulcorrelations. At test time, however, we can still project any glyph tothe manifold. Of course, the projection and reconstruction qualitywill suffer as the quality of fits degrades.

Figure 10: Glyph fits at different degrees of fitting error. The thresh-old of T = 1.0 is picked as acceptable for learning purposes. Dur-ing test time, worse fits might also be successful projected onto themanifold.

In the supplementary material, we provide an extended versionof all our experiments over many of the fonts: We show the fittingresult visualizations for a randomly selected sample of fonts, andthe three application results ran over a larger set of fonts.

8.2. Comparison to prior work

We evaluate both the representation and the learning component ofthe method in comparison to previous work.

Representation Comparison We compare our method to the state-of-the-art method of Campbell and Kautz [CK14] and a raster-basedapproach. The goal of all three methods is to produce a consistentfont representation, thus we evaluate each of the three representa-tions using interpolation, which allows one to visualize qualitativechanges in consistency as one transitions from one style to theother. Interpolation is the most direct way to measure consistency,while evaluation on other tasks, such as style prediction, will beconfounded by properties of the learning algorithm.

For both our approach and that of Campbell and Kautz [CK14],we employ the same interpolation scheme in a PCA space learnedfrom the same set of 12 fonts. We investigated whether results forthe latter approach differ when performed without PCA reduction,and found that the results are similar (see Supplementary Material).For the raster-based approach, we learn a VAE [KW13] manifoldon 64 × 64 rasterizations of glyphs with two fully connected layersof sizes 256 and 10, where the last layer performs sampling (as in[KW13]). Typically, neural networks require more training data, soto provide a fair comparison, we augment the training data set withrasterizations from our full font data set, since the data sets containeddifferent fonts (to a total of 582 fonts). While results of the raster-based method will differ with a different choice of training data orarchitecture, they still will exhibit blurry artefacts, as demonstratedin Figures 11a, 11b.

Unlike Campbell and Kautz [CK14], our model does not requirethat outline vertices on two glyphs of different styles but the samecharacter have a bijective correspondence. This correspondence isill-defined when comparing distinct topologies, as demonstrated inFigure 11a. This allows our method to explicitly interpolate be-tween corresponding parts, which may be preferred to graduallydegenerating serifs present only in one of the interpolants. In ad-dition, explicit enforcement of corners from the fitting procedureoften yields better preservation of sharp features in the resultingintermediate representations, as can be seen by the sharp corners ofthe ends of the serifs. On the other hand, the explicit correspondenceaids in producing better quality results where this correspondence

c© 2018 The AuthorsComputer Graphics Forum c© 2018 The Eurographics Association and John Wiley & Sons Ltd.

Page 10: Learning A Stroke‐Based Representation for Fontsric fonts with constraints, and Hu and Hersch [HH01] provided a richer set of geometric components to define a parametric typeface

10 E. Balashova et al. / Learning A Stroke-Based Representation for Fonts

Part-based Representation

Campbell and Kautz [CK14]

Raster-based

(a) Interpolations between fonts of different topology (TimesNewRomanPSMT

and TrebuchetMS fonts). Interpolation between serifed and non-serifed fonts

yields interpolations only between corresponding parts. Decorative elements,

such as serifs, that are present only in one font are purposely not interpolated

in our method, and the Bezier-based representation avoids the blurry artifacts

seen in the raster-based approach.

Part-based Representation

Campbell and Kautz [CK14]

Raster-based

(b) Interpolations between fonts of similar topology (TimesNewRomanPSMT

and Georgia fonts). Quality ofi nterpolation between same topology fonts is the

same. Since our method enforces corner constraints during fitting, corners are

better preserved during the interpolation process (seen on the sharp corners at

the edges of the serifs).

Figure 11: Comparison of different representations.

is indeed meaningful: for example, the intermediate interpolationsof Campbell and Kautz [CK14] between two glyphs of the sametopology better respect parallel constraints of the stems of ‘h’ (inFigure 11a) and generate slightly better width/curvature variation(in Figure 11b), compared to our method. Finally, both vector-basedapproaches outperform the raster-based method, which inherentlyproduces blurry intermediate interpolations.

Generative Model Comparison To evaluate our generative model,we empirically compare it to a non-linear VAE [KW13] learneddirectly on the font representation from the fitting step. We im-plemented two variants: a vanilla version that learns on exampleswith consistent topology (no missing parts) and a denoising non-linear VAE [VLBM08] that can handle different topologies andmissing parts (please see supplementary material for training andarchitecture details). The vanilla non-linear VAE produced smoothinterpolations on examples in its domain (see Figure 12b). How-ever, the denoising non-linear VAE failed to produce meaningful

EM-PCA [Row98]

Denoising Non-linear Variational Autoencoder (VAE)

(a) Interpolations between fonts of different topology (TimesNewRo-

manPSMT and TrebuchetMS fonts).

EM-PCA [Row98]

Denoising Non-linear Variational Autoencoder (VAE)

Vanilla Non-linear Variational Autoencoder (VAE)

(b) Interpolations between fonts of similar topology (TimesNewRo-

manPSMT and Georgia fonts).

Figure 12: Comparison of different generative models. Our methodis able to produce meaningful interpolations between glyphs ofsame topology as well as glyphs of different topologies. The VAEmethod is only able to produce meaningful interpolations betweenglyphs of same topology (known as vanilla VAE). When applied tomulti-topology data (denoising VAE), it introduces unnatural partvariations.

interpolations between examples, introducing unnatural part defor-mations. This is likely due to the high-dimensionality of our modeland the lack of training data samples (even after extensive data aug-mentation), which hinders the network from capturing the correctposterior distribution and learning a more complex model withoutoverfitting. This phenomenon is often referred to as the curse of di-mensionality [Bel15]. In addition, learning on a cross-topology dataset requires imputing missing values (which are a form of noise),and, as has been highlighted [EHN*98, PPS03], linear methods maybe preferable in low signal-to-noise ratio scenarios over non-linearmodels. Finally, the missing values in our case are ‘missing not atrandom’ (MNAR), which is known to be challenging for existingmethods [TLZJ17]. As a result, we found that for the consideredscenario of MNAR and signal-to-noise ratio, a linear model is pre-ferred. We thus use the a simpler linear model (EM-PCA) to demon-strate proposed applications, and leave analysis and development ofmore complex generative models (such as generalized non-linearVAE [WHWW14] that additionally considers data relationshipsto mitigate dimensionality issue) for future work. A parametrizedfont database and a low-dimensional manifold enable a variety ofexploratory applications that aim to browse and predict consistentglyph styles. Since it is challenging to quantitatively evaluate style,

c© 2018 The AuthorsComputer Graphics Forum c© 2018 The Eurographics Association and John Wiley & Sons Ltd.

Page 11: Learning A Stroke‐Based Representation for Fontsric fonts with constraints, and Hu and Hersch [HH01] provided a richer set of geometric components to define a parametric typeface

E. Balashova et al. / Learning A Stroke-Based Representation for Fonts 11

we follow the approach of prior work in showcasing a variety ofqualitative experiments that demonstrate various applications of ourmethod.

9. Applications

Our font representation can be used in a variety of applications formodelling, organizing and analysing fonts. In this work, we focuson three applications that demonstrate the benefit of our method.First, our part-aware model enables the user to use topological con-straints when retrieving the most similar font in a database (e.g.find a font that is most similar to this sans-serif font, but with ser-ifs). Second, we demonstrate font completion results enabled bythe use of an EM-PCA model that can handle missing data andthe effective factorization of fonts into strokes and outlines in ourrepresentation.

To ensure that our results are representative of real-world usecases, we split the database into training (370) and testing (200) sets,where we learn our manifold on the training data, and use the testdata as query either for retrieval or completion. For visualization,each glyph outline is drawn in its scaled unit box representationused for training (scale parameters may not be known during testingtime).

Topology-aware retrieval By consistently parametrizing a largevariety of fonts and embedding them in a low-dimensional manifold,we implicitly organize font database and learn their similarities incontext of the entire data set. As was demonstrated before [CK14],this can facilitate font retrieval. In addition, our topology-awarerepresentation enables us to constraint the retrieved results tohave certain features (e.g. serifs, or particular stroke structure). InFigure 13, we present two queries of the database. For each queryfont (top row), we retrieve the most similar (second row) and dis-similar (third row) ones. We can also impose additional constraintson the retrieved font, such as it must have serifs (fourth row) or adifferent topology for some glyphs (fifth row).

Font completion Since our learning method allows projecting apartial font vector to the manifold, we can reconstruct the fullfont from this a partial projection and thus complete the remainingglyphs. In this experiment, we hold out a subset of glyphs and usethe manifold to predict it. In Figure 14, we visualize some samplecompletion predictions. Given a font consisting only of the glyphsin the word ‘hamburgefon’, we use the first seven letters to predictthe style of the last three. It can be seen that our system correctlypredicts the slant and part widths, italics and decorative elements.Specifically, notice the consistency in slant in Figure 14(2,4,7,9).The system is able to infer the correct variation of extremely thinstyles such as (3) and (8), and thicker styles such as (2) and (5).Finally, notice that our system correctly predicted the existence ofdecorative elements in glyphs ‘f’ and ‘n’ examples (11–14), basedon their presence in the input glyphs.

An interesting completion question to ask is to see how manyglyphs are needed to be designed in order to predict the rest. Forexample, one might want to design a subset of the most repre-sentative glyphs (such as the word ‘handglove’), and complete the

Figure 13: Font Retrieval. We project the query (testing) font intothe manifold and search for existing fonts in the manifold. It canbe seen that the nearest font is stylistically very similar, while thefarthest font is quite distinct. We also demonstrate topology-basedqueries, which take advantage of our part-based glyph representa-tion. The style of the nearest fonts under these topology constraintsalso nicely matches the query font style.

rest. In Figure 15, we complete the fonts UbuntuMono-Italic andAveriaSansLibre-BoldItalic, based on different numbers of givenletters. As more and more letters are given, the prediction qual-ity of some glyphs that generally exhibit a lot of variation im-proves (such as ‘f’ and ‘m’), while the prediction quality of sim-pler glyphs such as ‘x’, ‘y’ and ‘z’ is correctly inferred evenfor only a small subset of provided glyphs. While the predictedglyphs are generally below the quality of manually designed fonts,the predictions suggest that the model does capture stylistic co-occurrences necessary for generating a stylistically compatible set ofglyphs.

10. Conclusion

This paper has proposed a new method for extracting a consistentparametrization for a large collection of fonts that can be used toassist font design applications. In comparison to previous work, thekey novelty is the automatically extracted and part-aware font repre-sentation, that describes a glyph outline with respect to a segmented,centre-line skeleton, which is suitable for learning a font manifold.We provide template fitting algorithms to extract this representa-tion automatically and investigate manifold learning algorithms tofactor co-occurrences within and across fonts in this representation.We demonstrate that the low-dimensional manifold learned with the

c© 2018 The AuthorsComputer Graphics Forum c© 2018 The Eurographics Association and John Wiley & Sons Ltd.

Page 12: Learning A Stroke‐Based Representation for Fontsric fonts with constraints, and Hu and Hersch [HH01] provided a richer set of geometric components to define a parametric typeface

12 E. Balashova et al. / Learning A Stroke-Based Representation for Fonts

Figure 14: Completion of fonts in different styles. The input glyphsare coloured blue and the predicted glyphs are coloured yellow.

proposed methods is useful for a new font design application, fontcompletion, as well as traditional applications like font retrieval andfont interpolation.

This work is one step towards font design by non-experts. It pro-vides the underlying framework for synthesizing new fonts within amanifold learned from data. However, it could be extended in manyways. First, it could be integrated with interactive tools that providereal-time feedback, user-provided constraints and post-processingtechniques that generate print-quality glyphs from coarse predic-tions of the learning step. Secondly, it could be extended to handlewider varieties of fonts and characters (for example, Cyrillic scripts).Finally, it could be adapted to handle data from other fields where2D boundary curves must be co-parametrized and analysed (e.g.medicine, agriculture and others).

Acknowledgements

We would like to thank Ariel Shamir, Princeton Graphics and Visiongroups, and anonymous reviewers for suggestions and discussions.We thank Asya Dvorkin for help with setting up experiments. During

Figure 15: Completion robustness experiment. The number of let-ters necessary for completion is examined. The input glyphs arecoloured blue and the predicted glyphs are coloured yellow.

c© 2018 The AuthorsComputer Graphics Forum c© 2018 The Eurographics Association and John Wiley & Sons Ltd.

Page 13: Learning A Stroke‐Based Representation for Fontsric fonts with constraints, and Hu and Hersch [HH01] provided a richer set of geometric components to define a parametric typeface

E. Balashova et al. / Learning A Stroke-Based Representation for Fonts 13

this work, Elena Balashova was supported by the National ScienceFoundation Graduate Fellowship (NSF-GRFP). Support was alsoprovided by NSF CRI 1729971 and Adobe.

References

[Ada86] ADAMS D. A.: A Dialogue of Forms: Letter and DigitalFont Design. PhD thesis, Massachusetts Institute of Technology,1986.

[Ada89] ADAMS D.: abcdefg (A better constraint driven environmentfor font generation). In Proceedings of International Conferenceon Raster Imaging and Digital Typography (Lausanne, Switzer-land, 1989), pp. 54–70.

[Ado97] ADOBE: Designing multiple master typefaces. Tech. Rep.,1997.

[as8] as8: Typography & type design. Available at http://www.as8.it/edu/typography.html. Accessed April 9, 2018.

[AXZ*15] ALHASHIM I., XU K., ZHUANG Y., CAO J., SIMARI P., ZHANG

H.: Deformation-driven topology-varying 3D shape correspon-dence. ACM Transactions on Graphics 34, 6 (Oct.2015), 236:1–236:13. https://doi.org/10.1145/2816795.2818088.

[Bel15] BELLMAN R. E.: Adaptive Control Processes: A GuidedTour. Princeton University Press, Princeton, 2015.

[Car95] CARTER S.: Twentieth Century Type Designers. LundHumphries, London, 1995.

[CK14] CAMPBELL N. D., KAUTZ J.: Learning a manifold of fonts.ACM Transactions on Graphics 33, 4 (2014), 91:1–91:11.

[EHN*98] ENNIS M., HINTON G., NAYLOR D., REVOW M., TIBSHIRANI

R.: A comparison of statistical learning methods on the gustodatabase. Statistics in Medicine 17, 21 (1998), 2501–2508.

[Fon17] FONTS.: Fonts database. Available at http://www.1001fonts.com/. Accessed September 1, 2016.

[Gon98] GONCZAROWSKI J.: Producing the skeleton of a character. InElectronic Publishing, Artistic Imaging, and Digital Typography.Springer, Berlin Heidelberg (1998), pp. 66–76.

[HB91] HERSCH R. D., BETRISEY C.: Model-based matching andhinting of fonts. ACM SIGGRAPH Computer Graphics 25 (1991)71–80.

[Her94] HERSCH R. D.: Font rasterization: The state of the art.In From Object Modelling to Advanced Visual Communication.Springer, New York (1994), pp. 274–296.

[HH01] HU C., HERSCH R. D.: Parameterizable fonts based on shapecomponents. IEEE Computer Graphics and Applications 21, 3(2001), 70–85.

[hin] Typotheque: Font hinting. Available at https://www.typotheque.com/articles/hinting. Accessed April 9, 2018.

[JPF06] JAKUBIAK E. J., PERRY R. N., FRISKEN S. F.: An improvedrepresentation for stroke-based fonts. In Proceedings of ACMSIGGRAPH (Boston, MA, USA, 2006), vol. 4.

[KCKK12] KALOGERAKIS E., CHAUDHURI S., KOLLER D.,KOLTUN V.: A probabilistic model for component-based shapesynthesis. ACM Transactions on Graphics 31, 4 (2012),55:1–55:11.

[KLM*13] KIM V. G., LI W., MITRA N. J., CHAUDHURI S., DIVERDI

S., FUNKHOUSER T.: Learning part-based templates from largecollections of 3D shapes. ACM Transactions on Graphics 32, 4(2013), 70:1–70:12.

[Knu86] KNUTH D. E.: METAFONT: The Program. Addison-Wesley, Reading, MA, 1986.

[KW13] KINGMA D. P., WELLING M.: Auto-encoding variationalBayes. arXiv preprint arXiv:1312.6114, 2013.

[LHLF15] LIU T., HERTZMANN A., LI W., FUNKHOUSER T.: Stylecompatibility for 3D furniture models. ACM Transactions onGraphics 34, 4 (2015), 85:1–85:9.

[LKS15] LUN Z., KALOGERAKIS E., SHEFFER A.: Elements of style:Learning perceptual shape style similarity. ACM Transactions onGraphics 34, 4 (2015), 84:1–84:14.

[LKWS16] LUN Z., KALOGERAKIS E., WANG R., SHEFFER A.: Func-tionality preserving shape style transfer. ACM Transactions onGraphics 35, 6 (2016), 209:1–209:14.

[LZX16] LIAN Z., ZHAO B., XIAO J.: Automatic generation of large-scale handwriting fonts via style learning. In SIGGRAPH ASIA2016 Technical Briefs (2016), ACM, pp. 12:1–12:14.

[Mic16] MICROSOFT: Opentype font variations. Available at https://www.microsoft.com/typography/otspec180/otvaroverview.htm.Accessed September 1, 2016.

[MSCP*07] MYRONENKO A., SONG X., CARREIRA-PERPINAN M. A.:Non-rigid point set registration: Coherent point drift. Advances inNeural Information Processing Systems 19 (2007), 1009–1016.

[NFC07] NAVARATNAM R., FITZGIBBON A. W., CIPOLLA R.: Thejoint manifold model for semi-supervised multi-valued regres-sion. In Proceedings of IEEE 11th International Conference onComputer Vision (Rio de Janeiro, Brazil, Oct. 2007), pp. 1–8.https://doi.org/10.1109/ICCV.2007.4408976.

[OST14] OSTERER H., STAMM P., TYPOGRAPHIE S.: Adrian Frutiger –Typefaces: The Complete Works. Birkhauser, Basel, 2014.

[PFC15] PHAN H. Q., FU H., CHAN A. B.: Flexyfont: Learning trans-ferring rules for flexible typeface synthesis. Computer GraphicsForum 34 (2015), 245–256.

[PPS03] PERLICH C., PROVOST F., SIMONOFF J. S.: Tree inductionvs. logistic regression: A learning-curve analysis. Journal of Ma-chine Learning Research 4, (2003), 211–255.

c© 2018 The AuthorsComputer Graphics Forum c© 2018 The Eurographics Association and John Wiley & Sons Ltd.

Page 14: Learning A Stroke‐Based Representation for Fontsric fonts with constraints, and Hu and Hersch [HH01] provided a richer set of geometric components to define a parametric typeface

14 E. Balashova et al. / Learning A Stroke-Based Representation for Fonts

[PSS01] PRAUN E., SWELDENS W., SCHRODER P.: Consistent meshparameterizations. In SIGGRAPH ’01: Proceedings of the 28thAnnual Conference on Computer Graphics and Interactive Tech-niques (Los Angeles, CA, USA, 2001), ACM, pp. 179–184.https://doi.org/10.1145/383259.383277.

[Row98] ROWEIS S. T.: EM algorithms for PCA and SPCA. InAdvances in Neural Information Processing Systems 10. M. I.Jordan, M. J. Kearns and S. A. Solla (Eds.). MIT Press, Cam-bridge, MA (1998), pp. 626–632.

[SI10] SUVEERANONT R., IGARASHI T.: Example-based automatic fontgeneration. In Proceedings of Smart Graphics, LNCS (Banff, AB,Canada, 2010), pp. 127–138.

[SKG*05] SCHOLZ M., KAPLAN F., GUY C. L., KOPKA J., SELBIG J.:Non-linear PCA: A missing data approach. Bioinformatics 21,20 (2005), 3887–3895.

[SR98] SHAMIR A., RAPPOPORT A.: Feature-based design of fontsusing constraints. In Electronic Publishing, Artistic Imaging, andDigital Typography. Springer, Berlin Heidelberg (1998), pp. 93–108.

[SR99] SHAMIR A., RAPPOPORT A.: Compacting oriental fonts byoptimizing parametric elements. Visual Computer 15, 6 (1999),302–318.

[SSCO08] SHAPIRA L., SHAMIR A., COHEN-OR D.: Consis-tent mesh partitioning and skeletonisation using the shapediameter function. Visual Computer 24, 4 (2008), 249.https://doi.org/10.1007/s00371-007-0197-5.

[TLZJ17] TRAN L., LIU X., ZHOU J., JIN R.: Missingmodalities imputation via cascaded residual autoencoder.In Proceedings of the IEEE Conference on Computer Vi-sion and Pattern Recognition (Honolulu, HI, USA, 2017),pp. 1405–1414.

[Tra03] TRACY W.: Letters of Credit: A View of Type Design. DavidR. Godine Publisher, Boston, 2003.

[Tsc98] TSCHICHOLD J.: The new typography: a handbook for mod-ern designers (vol. 8). University of California Press, Berkeley,CA, 1998.

[USB16] UPCHURCH P., SNAVELY N., BALA K.: From a to z: Super-vised transfer of style and content using deep neural networkgenerators. arXiv preprint arXiv:1603.02003, 2016.

[VLBM08] VINCENT P., LAROCHELLE H., BENGIO Y., MANZAGOL P.-A.: Extracting and composing robust features with denoisingautoencoders. In Proceedings of the 25th International Con-ference on Machine Learning (Helsinki, Finland, 2008), ACM,pp. 1096–1103.

[WHWW14] WANG W., HUANG Y., WANG Y., WANG L.: Generalizedautoencoder: A neural network framework for dimensionality re-duction. In Proceedings of the IEEE Conference on Computer Vi-sion and Pattern Recognition Workshops (Columbus, OH, USA,2014), pp. 490–497.

[WYJ*15] WANG Z., YANG J., JIN H., SHECHTMAN E., AGAR-WALA A., BRANDT J., HUANG T. S.: Deepfont: Identify your fontfrom an image. In Proceedings of the 23rd ACM InternationalConference on Multimedia (Brisbane, Australia, 2015), ACM,pp. 451–459.

[Zap87] ZAPF H.: Hermann Zapf & His Design Philosophy: SelectedArticles and Lectures on Calligraphy and Contemporary Devel-opments in Type Design, with Illustrations and BibliographicalNotes, and a Complete List of His Typefaces. Society of Typo-graphic Arts, Chicago, IL, 1987.

Supporting Information

Additional supporting information may be found online in the Sup-porting Information section at the end of the article.

Figure 1: Visualization of templates used by our system.

Figure 2: Mean Hausdorff error for many-to-one completion.

c© 2018 The AuthorsComputer Graphics Forum c© 2018 The Eurographics Association and John Wiley & Sons Ltd.