arxiv:1311.3618v2 [cs.cv] 15 nov 2013 · tuesday, october 29, 13 ... short, attributes capture...

13
Describing Textures in the Wild Mircea Cimpoi 1 , Subhransu Maji 2 , Iasonas Kokkinos 3 , Sammy Mohamed 4 , and Andrea Vedaldi 1 1 Department of Engineering Science, University of Oxford 2 Toyota Technological Institute, Chicago (TTIC) 3 Center for Visual Computing, Ecole Centrale Paris 4 Stony Brook University Abstract Patterns and textures are defining characteristics of many natural objects: a shirt can be striped, the wings of a butterfly can be veined, and the skin of an animal can be scaly. Aiming at supporting this analytical dimension in im- age understanding, we address the challenging problem of describing textures with semantic attributes. We identify a rich vocabulary of forty-seven texture terms and use them to describe a large dataset of patterns collected “in the wild”. The resulting Describable Textures Dataset (DTD) is the ba- sis to seek for the best texture representation for recognizing describable texture attributes in images. We port from ob- ject recognition to texture recognition the Improved Fisher Vector (IFV) and show that, surprisingly, it outperforms specialized texture descriptors not only on our problem, but also in established material recognition datasets. We also show that the describable attributes are excellent texture de- scriptors, transferring between datasets and tasks; in par- ticular, combined with IFV, they significantly outperform the state-of-the-art by more than 8% on both FMD and KTH- TIPS-2b benchmarks. We also demonstrate that they pro- duce intuitive descriptions of materials and Internet images. 1. Introduction Recently visual attributes have raised significant inter- est in the community [6, 11, 17, 25]. A “visual attribute” is a property of an object that can be measured visually and has a semantic connotation, such as the shape of a hat or the color of a ball. Attributes allow characterizing objects in far greater detail than a category label and are therefore the key to several advanced applications, including understanding complex queries in semantic search, learning about objects from textual description, and accounting for the content of Figure 1: Both the man-made and the natural world are an abundant source of richly textured objects. The textures of objects shown above can be described (in no particular order) as dotted, striped, chequered, cracked, swirly, hon- eycombed, and scaly. We aim at identifying these attributes automatically and generating descriptions based on them. images in great detail. Textural properties have an important role in object descriptions, particularly for those objects that are best qualified by a pattern, such as a shirt or the wing of bird or a butterfly as illustrated in Fig. 1. Nevertheless, so far the attributes of textures have been investigated only tan- gentially. In this paper we address the question of whether there exists a “universal” set of attributes that can describe a wide range of texture patterns, whether these can be reliably estimated from images, and for what tasks they are useful. The study of perceptual attributes of textures has a long history starting from pre-attentive aspects and group- ing [16], to coarse high-level attributes [1, 2, 33], to some recent work aimed at discovering such attributes by au- tomatically mining descriptions of images from the Inter- net [3, 12]. However, the texture attributes investigated so far are rather few or too generic for a detailed description most “real world” patterns. Our work is motivated by the one of Bhusan et al. [5] who studied the relationship be- tween commonly used English words and the perceptual properties of textures, identifying a set of words sufficient to describing a wide variety of texture patterns. While they study the psychological aspects of texture perception, the arXiv:1311.3618v2 [cs.CV] 15 Nov 2013

Upload: dinhtu

Post on 08-Apr-2018

217 views

Category:

Documents


4 download

TRANSCRIPT

Describing Textures in the Wild

Mircea Cimpoi1, Subhransu Maji2, Iasonas Kokkinos3, Sammy Mohamed4, and Andrea Vedaldi1

1Department of Engineering Science, University of Oxford2Toyota Technological Institute, Chicago (TTIC)

3Center for Visual Computing, Ecole Centrale Paris4Stony Brook University

Abstract

Patterns and textures are defining characteristics ofmany natural objects: a shirt can be striped, the wings ofa butterfly can be veined, and the skin of an animal can bescaly. Aiming at supporting this analytical dimension in im-age understanding, we address the challenging problem ofdescribing textures with semantic attributes. We identify arich vocabulary of forty-seven texture terms and use them todescribe a large dataset of patterns collected “in the wild”.The resulting Describable Textures Dataset (DTD) is the ba-sis to seek for the best texture representation for recognizingdescribable texture attributes in images. We port from ob-ject recognition to texture recognition the Improved FisherVector (IFV) and show that, surprisingly, it outperformsspecialized texture descriptors not only on our problem, butalso in established material recognition datasets. We alsoshow that the describable attributes are excellent texture de-scriptors, transferring between datasets and tasks; in par-ticular, combined with IFV, they significantly outperform thestate-of-the-art by more than 8% on both FMD and KTH-TIPS-2b benchmarks. We also demonstrate that they pro-duce intuitive descriptions of materials and Internet images.

1. Introduction

Recently visual attributes have raised significant inter-est in the community [6, 11, 17, 25]. A “visual attribute”is a property of an object that can be measured visually andhas a semantic connotation, such as the shape of a hat or thecolor of a ball. Attributes allow characterizing objects in fargreater detail than a category label and are therefore the keyto several advanced applications, including understandingcomplex queries in semantic search, learning about objectsfrom textual description, and accounting for the content of

Tuesday, October 29, 13

Figure 1: Both the man-made and the natural world arean abundant source of richly textured objects. The texturesof objects shown above can be described (in no particularorder) as dotted, striped, chequered, cracked, swirly, hon-eycombed, and scaly. We aim at identifying these attributesautomatically and generating descriptions based on them.

images in great detail. Textural properties have an importantrole in object descriptions, particularly for those objects thatare best qualified by a pattern, such as a shirt or the wing ofbird or a butterfly as illustrated in Fig. 1. Nevertheless, sofar the attributes of textures have been investigated only tan-gentially. In this paper we address the question of whetherthere exists a “universal” set of attributes that can describe awide range of texture patterns, whether these can be reliablyestimated from images, and for what tasks they are useful.

The study of perceptual attributes of textures has along history starting from pre-attentive aspects and group-ing [16], to coarse high-level attributes [1, 2, 33], to somerecent work aimed at discovering such attributes by au-tomatically mining descriptions of images from the Inter-net [3, 12]. However, the texture attributes investigated sofar are rather few or too generic for a detailed descriptionmost “real world” patterns. Our work is motivated by theone of Bhusan et al. [5] who studied the relationship be-tween commonly used English words and the perceptualproperties of textures, identifying a set of words sufficientto describing a wide variety of texture patterns. While theystudy the psychological aspects of texture perception, the

arX

iv:1

311.

3618

v2 [

cs.C

V]

15

Nov

201

3

banded blotchy braided bubbly bumpy chequered cobwebbed cracked crosshatched crystalline dotted fibrous

flecked freckled frilly gauzy grid grooved honeycombed interlaced knitted lacelike lined marbled

matted meshed paisley perforated pitted pleated polka-dotted porous potholed scaly smeared spiralled

sprinkled stained stratified striped studded swirly veined waffled woven wrinkled zigzagged

Figure 2: The 47 texture words in the describable texture dataset introduced in this paper. Two examples of each attributeare shown to illustrate the significant amount of variability in the data.

focus of this paper is the challenge of estimating such prop-erties from images automatically.

Our first contribution is to select a subset of 47 de-scribable texture attributes, based on the work of Bhusanet al., that capture a wide variety of visual properties oftextures and to introduce a corresponding describable tex-ture dataset consisting of 5,640 texture images jointly an-notated with the 47 attributes (Sect. 2). In an effort tosupport directly real world applications, and inspired bydatasets such as ImageNet [10] and the Flickr MaterialDataset (FMD) [30], our images are captured “in the wild”by downloading them from the Internet rather than collect-ing them in a laboratory. We also address the practical is-sue of crowd-sourcing this large set of joint annotations ef-ficiently accounting for the co-occurrence statistics of at-tributes, the appearance of the textures, and the reliabilityof annotators (Sect. 2.1).

Our second contribution is to identify a gold standardtexture representation that achieves optimal recognition ofthe describable texture attributes in challenging real-worldconditions. Texture classification has been widely stud-ied in the context of recognizing materials supported bydatasets such as CUReT [9], UIUC [18], UMD [39], Ou-tex [23], Drexel Texture Database [24], and KTH-TIPS [7,14]. These datasets address material recognition under vari-able occlusion, viewpoint, and illumination and have mo-tivated the creation of a large number of specialized tex-ture representations that are invariant or robust to these fac-tors [19, 23, 35, 36]. In contrast, generic object recognitionfeatures such as SIFT was shown to work the best for ma-terial recognition in FMD, which, like DTD, was collected“in the wild”. Our findings are similar, but we also find thatFisher vectors [26] computed on SIFT features and certaincolor features can significantly boost performance. Surpris-ingly, these descriptors outperform specialized state-of-the-art texture representations not only in recognizing our de-

scribable attributes, but also in a variety of datasets for ma-terial recognition, achieving an accuracy of 63.3% on FMDand 67.5% on KTH-TIPS2-b dataset (Sect. 3, 4.1).

Our third contribution consists in several applicationsof the proposed describable attributes. These can servea complimentary role for recognition and description indomains where the material is not-important or is knownahead of time, such as fabrics or wallpapers. However, canthese attributes improve other texture analysis tasks such asmaterial recognition? We answer this question in the affir-mative in a series of experiments on the challenging FMDand KTH datasets. We show that estimates of these proper-ties when used a features can boost recognition rates evenmore for material classification achieving an accuracy of53.1% on FMD and 64.6% on KTH when used alone as a47 dimensional feature, and 65.4% on FMD and 74.6% onKTH when combined with SIFT and simple color descrip-tors (Sect. 4.2). These represent more than an absolute gainof 8% in accuracy over previous state of the art. Our 47 di-mensional feature contributed with 2.2 to 7% to the gain.Furthermore, these attribute are easy to describe by de-sign, hence they can serve as intuitive dimensions to explorelarge collections of texture patterns – for e.g., product cat-alogs (wallpapers or bedding sets) or material datasets. Wepresent several such visualizations in the paper (Sect. 4.3).

2. The describable texture dataset

This section introduces the Describable Textures Dataset(DTD), a collection of real-world texture images annotatedwith one or more adjectives selected in a vocabulary of 47English words. These adjectives, or describable texture at-tributes, are illustrated in Fig. 2 and include words such asbanded, cobwebbed, freckled, knitted, and zigzagged.

DTD investigates the problem of texture description,intended as the recognition of describable texture attributes.This problem differs from the one of material recognition

considered in existing datasets such as CUReT, KTH, andFMD. While describable attributes are correlated with ma-terials, attributes do not imply materials (e.g. veined mayequally apply to leaves or marble) and materials do not im-ply attributes (not all marbles are veined). Describable at-tributes can be combined to create rich descriptions (Fig. 3;marble can be veined, stratified and cracked at the sametime), whereas a typical assumption is that textures aremade of a single material. Describable attributes are sub-jective properties that depend on the imaged object as wellas on human judgments, whereas materials are objective. Inshort, attributes capture properties of textures beyond mate-rials, supporting human-centric tasks where describing tex-tures is important. At the same time, they will be shown tobe helpful in material recognition as well (Sect. 3.2 and 4.2).

DTD contains textures in the wild, i.e. texture imagesextracted from the web rather than begin captured or gen-erated in a controlled setting. Textures fill the images, sowe can study the problem of texture description indepen-dently of texture segmentation. With 5,640 such images,this dataset aims at supporting real-world applications werethe recognition of texture properties is a key component.Collecting images from the Internet is a common approachin categorization and object recognition, and was adopted inmaterial recognition in FMD. This choice trades-off the sys-tematic sampling of illumination and viewpoint variationsexisting in datasets such as CUReT, KTH-TIPS, Outex, andDrexel datasets for a representation of real-world variations,shortening the gap with applications. Furthermore, the in-variance of describable attributes is not an intrinsic propertyas for materials, but it reflects invariance in the human judg-ments, which should be captured empirically.

DTD is designed as a public benchmark, following thestandard practice of providing 10 preset splits into equally-sized training, validation and test subsets for easier algo-rithm comparison (these splits are used in all the experi-ments in the paper). DTD will be made publicly availableon the web at [annonymized], along with standardized eval-uation, as well as code reproducing the results in Sect. 4.

Related work. Apart from material datasets, there havebeen numerous attempts at collecting attributes of texturesat a smaller scale, or in controlled settings. Our work is re-lated to the work of [22], where they analyzed images in theOutex dataset [23] using a subset of the attributes we con-sider. Their attributes were demonstrated to perform betterthan several low-level descriptors, but these were trainedand evaluated on the same dataset. Hence it is not clear iftheir learned attributes generalize well to other settings. Incontrast, we show that: (i) our texture attributes trained onDTD outperform their semantic attributes on Outex and (ii)they can significantly boost performance on a number ofother material and texture benchmarks (Sect. 4.2).

2.1. Dataset design and collection

This section discusses how DTD was designed and col-lected, including: selecting the 47 attributes, finding at least120 representative images for each attribute, collecting afull set of multiple attribute labels for each image in thedataset, and addressing annotation noise.

Selecting the describable attributes. Psychological ex-periments suggest that, while there are a few hundred wordsthat people commonly use to describe textures, this vocab-ulary is redundant and can be reduced to a much smallernumber of representative words. Our starting point is thelist of 98 words identified by Bhusan, Rao and Lohse [5].Their seminal work aimed to achieve for texture recogni-tion the same that color words have achieved for describingcolor spaces [4]. However, their work mainly focuses onthe cognitive aspects of texture perception, including per-ceptual similarity and the identification of directions of per-ceptual texture variability. Since we are interested in thevisual aspects of texture, we ignored words such as “corru-gated” that are more related to surface shape properties, andwords such as “messy” that do not necessarily correspond tovisual features. After this screening phase we analyzed theremaining words and merged similar ones such as “coiled”,“spiraled” and “corkscrewed” into a single term. This re-sulted in a set of 47 words, illustrated in Fig. 2.

Bootstrapping the key images. Given the 47 attributes,the next step was collecting a sufficient number (120) of ex-ample images representative of each attribute. A very largeinitial pool of about a hundred-thousands images was down-loaded from Google and Flickr by entering the attributesand related terms as search queries. Then Amazon Me-chanical Turk (AMT) was used to remove low resolution,poor quality, watermarked images, or images that were notalmost entirely filled with a texture. Next, detailed annota-tion instructions were created for each of the 47 attributes,including a dictionary definition of each concept and ex-amples of correct and incorrect matches. Votes from threeAMT annotators were collected for the candidate images ofeach attribute and a shortlist of about 200 highly-voted im-ages was further manually checked by the authors to elim-inate residual errors. The result was a selection of 120 keyrepresentative images for each attribute.

Sequential join annotations. So far only the key attributeof each texture image is known while any of the remaining46 attributes may apply as well. Exhaustively collectingannotations for 46 attributes and 5,640 texture images wasfound to be too expensive. To reduce this cost we proposeto exploiting the correlation and sparsity of the attribute oc-currences (Fig. 3). For each attribute q, twelve key images

0

0.05

0.1

0.15

Occ

urre

nces

per

imag

e

band

ed

blotch

y

braid

ed

bubb

ly

bum

py

cheq

uere

d

cobw

ebbe

d

crac

ked

cros

shat

ched

crys

tallin

e

dotte

d

fibro

us

fleck

ed

freck

ledfrilly

gauz

ygr

id

groo

ved

hone

ycom

bed

inter

laced

knitte

d

laceli

kelin

ed

mar

bled

mat

ted

mes

hed

paisl

ey

perfo

rate

dpit

ted

pleat

ed

polka

−dot

ted

poro

us

poth

oledsc

aly

smea

red

spira

lled

sprin

kled

staine

d

strat

ified

stripe

d

studd

edsw

irly

veine

d

waffle

d

woven

wrinkle

d

zigza

gged

All occurencesSequential + CVSequential

q

q′

Figure 3: Quality of joint sequential annotations. Each bar shows the average number of occurrences of a given attributein a DTD image. The horizontal dashed line corresponds to a frequency of 1/47, the minimum given the design of DTD(Sect. 2.1). The black portion of each bar is the amount of attributes discovered by the sequential procedure, using only10 annotations per image (about one fifth of the effort required for exhaustive annotation). The orange portion shows theadditional recall obtained by integrating CV in the process. Right: co-occurrence of attributes. The matrix shows the jointprobability p(q, q′) of two attributes occurring together (rows and columns are sorted in the same way as the left image).

are annotated exhaustively and used to estimate the proba-bility p(q′|q) that another attribute q′ could co-exist with q.Then for the remaining key images of attribute q, only anno-tations for attributes q′ with non negligible probability – inpractice 4 or 5 – are collected, assuming that the attributeswould not apply. This procedure occasionally misses at-tribute annotations; Fig. 3 evaluates attribute recall by 12-fold cross-validation on the 12 exhaustive annotations for afixed budget of collecting 10 annotations per image (insteadof 47).

A further refinement is to suggest which attributes q′ toannotated not just based on q, but also based on the indi-vidual appearance of an image `i. This was done by us-ing the attribute classifier learned in Sect. 4; after Platt’scalibration [28] on an held-out test set, the classifier scorecq′(`i) ∈ R is transformed in a probability p(q′|`i) =σ(cq′(`)) where σ(z) = 1/(1 + e−z) is the sigmoid func-tion. By construction, Platt’s calibration reflects the priorprobability p(q′) ≈ p0 = 1/47 of q′ on the validation set.To reflect the probability p(q′|q) instead, the score is ad-justed as

p(q′|`i, q) ∝ σ(cq′(`i))×p(q′|q)

1− p(q′|q)× 1− p0

p0

and used to find which attributes to annotated for each im-age. As shown in Fig. 3, for a fixed annotation budged thismethod increases attribute recall. Overall, with roughly 10annotations per images it was possible to recover of all theattributes for at least 75% of the images, and miss one outof four (on average) for another 20% while keeping the an-notation cost to a reasonable level.

Handling noisy annotations. So far it was assumed thatannotators are perfect: deterministic and noise-free. This

is not the case, in part due to the intrinsic subjectivity ofdescribable texture attributes, and in part due to distracted,adversarial, or unqualified annotators. As commonly done,we address this problem by collecting the same annotationmultiple times (five) using different annotators, and forminga consensus.

Beyond simple voting, we found that the method of [38]can effectively remove or down-weigh bad annotators im-proving agreement. This method models each annotator αj

as a classifier with a given bias and error rate. Then, given acollection aqij ∈ {0, 1} of binary annotations for attributeq, image i, and annotator j, it tries to estimate simultane-ously the ground truth labels aqi and the quality αj of theindividual annotators. The method is appealing as severalquantities are easily interpretable. For example, the priorp(αj) on annotators encodes how frequently we expect tofind good and bad annotators (e.g. we found that 0.5% ofthem labeled images randomly). A major difference com-pared to the scenario considered in [38] is that, in our case,the key attribute of each image is already known. By incor-porating this as additional prior, the method can use the keyattributes to implicitly benchmark and calibrate annotators.The final set of annotations {aqi} is obtained by threshold-ing the (approximated) posterior marginal p(aqi|{aqij}) to60%, similar to choosing three out of five votes in the ba-sic voting scheme, computed using variational inference.In general, we found most probabilities to be very close to100% or 0%, suggesting that there is little residual noise inthe process. We also inspected the top 30 images of each at-tribute based on simple voting and this posterior marginalsand found the ranking to be significantly improved.

3. Texture representationsGiven the DTD dataset developed in Sect. 2, this section

moves on to the problem of designing a system that canautomatically recognize the attributes of textures. Given atexture image ` the first step is to compute a representationφ(`) ∈ Rd of the image; the second step is to use a clas-sifier such as a Support Vector Machine (SVM) 〈w, φ(`)〉to score how strongly the `matches a given perceptual cate-gory. We propose two such representations: a gold-standardlow-level texture descriptor based on the improved FisherVector (Sect. 3.1) and a mid-level texture descriptor con-sisting of the describable attributes themselves (Sect. 3.2).The details of the classifiers are discussed in Sect. 4.

3.1. Improved Fisher vectors

This section introduces our gold-standard low-level tex-ture representation, the Improved Fisher Vector (IFV) ofand relates it to existing texture descriptors. We port IFVfrom the object recognition literature [27] and we show thatit substantially outperforms specialized texture representa-tions (Sect. 4).

Given an image `, the Fisher Vector (FV) formula-tion of [26] starts by extracting local SIFT [20] descrip-tors {d1, . . . ,dn} densely and at multiple scales. It thensoft-quantizes the descriptors by using a Gaussian MixtureModel (GMM) with K modes, prior probabilities πk, modemeans µk and mode covariances Σk. Covariance matri-ces are assumed to be diagonal, but local descriptors arefirst decorrelated and optionally dimensionality reduced byPCA. Then first and second order statistics are computed as

ujk =1

n√πk

n∑i=1

qikdji − µjk

σjk,

vjk =1

n√

2πk

n∑i=1

qik

[(dji − µjk

σjk

)2

− 1

],

where j spans descriptor dimensions and qik is the poste-rior probability of mode k given descriptor di, i.e. qik ∝exp

[− 1

2 (di − µk)T Σ−1k (di − µk)]. These statistics are

then stacked into a vector (u1,v1, . . . ,uK ,vK). In order toobtain the improved version of the representation, the signedsquare root

√|z| sign z is applied to its components and the

vector is l2 normalized.At least two key ideas in IFV were pioneered in tex-

ture analysis: the idea of sum-pooling local descriptors wasintroduced by [21], and the idea of quantizing local de-scriptors to construct histogram of features was pioneeredby [19] with their computational model of textons. How-ever, three key aspects of the IFV representation were de-veloped in the context of object recognition. The firstone is the use of the SIFT descriptors, originally devel-oped for object matching [20], that are more distinctive

that local descriptors popular in texture analysis such asfilter banks [13, 19, 36], local intensity patterns [23], andpatches [35]. The second one is replacing histogrammingwith the more expressive FV pooling method [26]. And thethird one is the use of the square-root kernel map [27] in theimproved version of the Fisher Vector.

We are not the first to use SIFT or IFV in texture recogni-tion. For example, SIFT was used in [29], and Fisher Vec-tors were used in [31]. However, neither work tested thestandard IFV formulation [27], which is well tuned for ob-ject recognition, developing instead variations specializedfor texture analysis. We were therefore somewhat surprisedto discover that the off-the-shelf method surpasses these ap-proaches (Sect. 4.1).

3.2. Describable attributes as a representation

The main motivation for recognizing describable at-tributes is to support human-centric applications, enrichingthe vocabulary of visual properties that machines can under-stand. However, once extracted, these attributes may also beused as texture descriptors in their own right. As a simpleincarnation of this idea, we propose to collect the responseof attribute classifiers trained on DTD in a 47-dimensionalfeature vector φ(`) = (c1(`), . . . , c47(`)). Sect. 4 showsthat this very compact representation achieves excellent per-formance in material recognition; in particular, combinedwith IFV (SIFT and color) it sets the new state-of-the-art onKTH-TIPS2-b and FMD. In addition to the contribution tothe best results, our proposed attributes generate meaning-ful descriptions of the materials from KTH-TIPS2-b (alu-minium foil: wrinkled; bread: porous).

4. Experiments4.1. Improved Fisher Vectors for textures

This section demonstrates the power of IFV as a tex-ture representation by comparing it to established texturedescriptors. Most of these representations can be brokendown into two parts: computing local image descriptors{d1, . . . ,dn} and encoding them into a global image statis-tics φ(`).

In IFV the local descriptors di are 128-dimensionalSIFT features, capturing a spatial histogram of the local gra-dient orientations; here spatial bins have an extent of 6 × 6pixels and descriptors are sampled every two pixels and atscales 2i/3, i = 0, 1, 2, . . . . We also evaluate as local de-scriptors the Leung and Malik (LM) [19] (48-D) and MR8(8-D) [13, 36] filter banks, the 3 × 3 and 7 × 7 raw imagepatches of[35], and the local binary patterns (LBP) of [23].

Encoding maps image descriptors {d1, . . . ,dn} to astatistics φ(`) ∈ Rd suitable for classification. En-coding can be as simple as averaging (sum-pooling) de-scriptors [21], although this is often preceded by a high-

KernelLocal d. Linear Hellinger add-χ2 exp-χ2

MR8 15.9 ± 0.8 19.7 ± 0.8 24.1 ± 0.7 30.7 ± 0.7LM 18.8 ± 0.5 25.8 ± 0.8 31.6 ± 1.1 39.7 ± 1.1Patch3×3 14.6 ± 0.6 22.3 ± 0.7 26.0 ± 0.8 30.7 ± 0.9Patch7×7 18.0 ± 0.4 26.8 ± 0.7 31.6 ± 0.8 37.1 ± 1.0LBPu 8.2 ± 0.4 9.4 ± 0.4 14.2 ± 0.6 24.8 ± 1.0LBP-VQ 21.1 ± 0.8 23.1 ± 1.0 28.5 ± 1.0 34.7 ± 1.3SIFT 34.7 ± 0.8 45.5 ± 0.9 49.7 ± 0.8 53.8 ± 0.8

Table 1: Comparison of local descriptors and kernels on theDTD data, averaged over ten splits.

dimensional sparse coding step. The most common cod-ing method is to vector quantize the descriptors using analgorithm such as K-means [19], resulting in the so-calledbag-of-visual-words (BoVW) representation [8]. Variationsinclude soft quantization by a GMM in FV (Sect. 3.1) orspecialized quantization schemes, such as mapping LBPs touniform patterns [23] (LBPu; we use the rotation invariantmultiple-radii version of [22] for comparison purposes). ForLBP, we also experiment with a variant (LBP-VQ) wherestandard LBPu2 is computed in 8× 8 pixel neighborhoods,and the resulting local descriptors are further vector quan-tized using K-means and pooled as this scheme performssignificantly better in our experiments.

For each of the selected features, we ex-perimented with several SVM kernels: linearK(x′,x′′) = 〈x′,x′′〉, Hellinger’s

∑di=1

√x′ix′′i ,

additive-χ2∑d

i=1 x′ix′′i /(x

′i + x′′i ), and exponential-χ2

exp[−λ∑d

i=1 (x′i − x′′i )2/(x′i + x′′i )]

kernels sign-extended as in [37]. In the latter case, λ is selected as oneover the mean of the kernel matrix on the training set. Thedata is normalized so that K(x′,x′′) = 1 as this is oftenfound to improve performance. Learning uses a standardnon-linear SVM solver and validation in order to select theparameter C in the range {0.1, 1, 10, 100} (the choice of Cwas found to have little impact on the result).

Local descriptor comparisons on DTD. This experi-ments compares local descriptors and kernels on DTD. Allcomparison use the bag-of-visual-word pooling/encodingscheme using K-means for vector quantization the descrip-tors. The DTD data is used as a benchmark averaging the re-sults on the ten train-val-test splits. K was cross-validated,finding an optimal setting of 1024 visual words for SIFTand color patches, 512 for LBP-VQ, 470 for the filter banks.Tab. 1, reports the mean Average Precision (mAP) for 47SVM attribute classifiers. As expected, the best kernel isexp-χ2, followed by additive χ2 and Hellinger, and then lin-ear. Dense SIFT (53.82% mAP) outperforms the best spe-cialized texture descriptor on the DTD data (39.67% mAPfor LM). Fig. 4 shows AP for each attribute: concepts likechequered achieve nearly perfect classification, while oth-

0

10

20

30

40

50

60

70

80

90

29.6

2 b

lotc

hy31

.01

gro

oved

32.7

2 s

pira

lled

32.8

6 s

mea

red

33.1

1 p

orou

s36

.03

sta

ined

40.8

6 b

umpy

41.2

3 m

eshe

d42

.75

mar

bled

44.6

1 p

itted

50.2

8 g

rid51

.06

cro

ssha

tche

d51

.64

mat

ted

52.0

8 s

prin

kled

52.5

1 v

eine

d54

.14

ple

ated

58.0

1 w

affle

d58

.10

wrin

kled

61.1

8 b

raid

ed63

.89

flec

ked

64.9

6 s

caly

66.6

8 p

erfo

rate

d67

.44

wov

en69

.58

hon

eyco

mbe

d69

.99

gau

zy70

.30

inte

rlace

d72

.30

cry

stal

line

72.5

5 b

ande

d74

.65

pol

ka−

dotte

d74

.91

line

d77

.73

zig

zagg

ed77

.94

bub

bly

78.1

4 s

wirl

y79

.07

fibr

ous

79.4

5 d

otte

d79

.83

frill

y80

.70

kni

tted

81.9

0 fr

eckl

ed83

.01

str

atifi

ed83

.10

pot

hole

d83

.90

cra

cked

84.2

2 s

trip

ed84

.36

lace

like

85.1

7 s

tudd

ed87

.55

cob

web

bed

89.6

9 p

aisl

ey95

.65

che

quer

ed

SIFT IFV on DTD mAP: 64.52

Figure 4: Per-class AP of the 47 describable attribute clas-sifiers on DTD using the IFVSIFT representation and linearclassifiers.

ers such as blotchy and smeared are far harder.

Encoding comparisons on DTD. This experiment com-pares three encodings: BoVW, VLAD [15] and IFV. VLADis similar to IFV, but uses K-means for quantization andstores only first-order statistics of the descriptors. DenseSIFT is used as a baseline descriptor and performance isevaluated on ten splits of DTD in Tab. 2. IFV (256 Gaus-sian modes) and VLAD (512 K-means centers) performssimilarly (about 60% mAP) and significantly better thanBoVW (53.82% mAP). As we will see next, however, IFVsignificantly outperforms VLAD in other texture datasets.We also experimented with the state-of-the-art descriptor of[32] which we did not find to be competitive with IFV onFMD and DTD (Tab. 2); unfortunately could not obtain animplementation of [29] to try on our data – however IFVSIFToutperforms it on material recognition.

State-of-the-art material classification. This experi-ments evaluates the encodings on several material recog-nition datasets: CUReT [9], UMD [39], UIUC [18], KTH-TIPS [14], KTH-TIPS2(a and b) [7], and FMD [30]. Tab. 2compares with the existing state-of-the-art [31, 32, 34] oneach of them. For saturated datasets such as CUReT, UMD,UIUC, KTH-TIPS the performance of most methods isabove to 99% mean accuracy and there is little differencebetween them. In harder datasets the advantage of IFV isevident: KTH-TIPS-2a (+5%), KTH-TIPS-2b (+3%), andFMD (+1%). In particular, while FMD includes manualsegmentations of the textures, these are not used here here.Furthermore, IFV is conceptually simpler than the multiplespecialized features used in [31] for material recognition.

DatasetSIFT Published

IFV BoVW VLAD Best [32]CUReT 99.6 ± 0.3 98.1 ± 0.9 98.8 ± 0.6 → 99.4UMD 99.2 ± 0.4 98.1 ± 0.8 99.3 ± 0.4 → 99.7 ± 0.3UIUC 97.0 ± 0.9 96.1 ± 2.4 96.5 ± 1.0 → 99.4 ± 0.4KTH-TIPS 99.7 ± 0.1 98.6 ± 1.0 99.2 ± 0.8 → 99.4 ± 0.4KTH-TIPS-2aα 82.5 ± 5.2 74.8 ± 5.4 76.5 ± 5.2 73.0 ± 4.7 [31] –KTH-TIPS-2bβ 69.3 ± 1.0 58.4 ± 2.2 63.1 ± 2.1 66.3 [34] –FMD 58.2 ± 1.7 49.5 ± 1.9 52.6 ± 1.5 57.1 / 55.6 [29]γ 41.4± 1.3

DTD 61.5 ± 1.4 55.6 ± 1.3 59.8 ± 1.0 – 40.2± 0.5

Feature KTH-TIPS-2b FMDDTDLIN 61.1 ± 2.8 48.9 ± 1.9DTDRBF 64.6 ± 1.5 53.1 ± 2.0IFVSIFT 69.3 ± 1.0 58.2± 1.7IFVRGB 58.8 ± 2.5 47.0 ± 2.7IFVSIFT + IFVRGB 67.5 ± 3.3 63.3 ± 1.9DTDRBF + IFVSIFT 68.4 ± 1.4 60.1 ± 1.6DTDRBF + IFVRGB 70.9 ± 3.5 61.3 ± 2.0All three 74.6 ± 3.0 65.4 ± 2.0Prev. state of the art 66.3 [34] 57.1 [29]

Table 2: Left: Comparison of encodings and state-of-the-art texture recognition methods on DTD as well as standard materialrecognition benchmarks. α : three samples for training, one for evaluation; β : one sample for training, three for evaluation.γ : with/without ground truth masks ([29] Sect. 6.5); our results do not use them. Right: Combined with IFVSIFT and IFVRGB,the DTDRBF features achieve a significant improvement in classification performance on the challenging KTH-TIPS-2b andFMD compared to published state of the art results.

4.2. Describable attributes as a representation

This section evaluates using the 47 describable attributesas a texture descriptor applying it to the task of materialrecognition. The attribute classifiers are trained on DTDusing the IFV+SIFT representation and linear classifiers asin the previous section (DTDLIN). As explained in Sect. 3.2,these are then used to form 47-dimensional descriptors ofeach texture image in FMD and KTH-TIPS2-b.

When combined with a linear SVM classifier, results arepromising (Tab. 2): on KTH-TIPS2-b, the describable at-tributes yield 61.1% mean accuracy and 49.0% on FMDoutperforming the aLDA model of [29] combining color,SIFT and edge-slice (44.6%). While results are not as goodas the IFVSIFT representation, the dimensionality of this de-scriptor is three orders of magnitude smaller than IFV. Forthis reason, using an RBF classifier with the DTD featuresis relatively cheap. Doing so improves the performance by3.5–4% (DTDRBF).

We also investigated combining multiple features:DTDRBF with IFVSIFT and IFVRGB. IFVRGB computes theIFV representation on top of all the 3 × 3 RGB patches inthe image in the spirit of [35]. The performance of IFVRGBis notable given the simplicity of the local descriptors; how-ever, it is not as good as DTDRBF which is also 26 timessmaller. The combination of IFVSIFT and IFVRGB is alreadynotably better than the previous state-of-the-art results andthe addition of DTDRBF improves by another significantmargin. Overall, our best result on KTH-TIPS-2b is 74.6%(vs. the previous best of 66.3) and on FMD of 65.4% (vs.57.1) on FMD, with an improvement of more than 8% ac-curacy in both cases.

Finally, we compared the semantic attributes of [22] withDTDLIN on the Outex data. Using IFVSIFT as an underlyingrepresentation for our attributes, we obtain 49.82% mAP onthe retrieval experiment of [22], which is is not as good astheir result with LBPu (63.3%). However, LBPu was de-veloped on the Outex data, and it is therefore not surprising

that it works so well. To verify this, we retrained our DTDattributes with IFV using LBPu as local descriptor, obtain-ing a score of 64.52% mAP. This is remarkable consider-ing that their retrieval experiment contains the data used totrain their own attributes (target set), while our attributes aretrained on a completely different data source. Tab. 1 showsthat LBPu is not competitive on DTD.

4.3. Search and visualization

Fig. 5 shows that there is an excellent semantic corre-lation between the ten categories in KTH-TIPS-2b and theattributes in DTD. For example, aluminium foil is found tobe wrinkled, while bread is found as: bumpy, pitted, porousand flecked.

In what follows, we experimented with describing im-ages from a challenging material dataset, FMD and encour-aged by the good results, we applied the same technique toimages from the wild, from some online catalog.

4.3.1 Subcategorizing FMD materials using describ-able texture attributes

The results shown in Fig. 6 extends the results in Table 2 andSect. 4.2 and illustrate the classification performance of the47-dimensional DTD descriptors on the FMD materials –note the excellent performance obtained for foliage, wood,and water, which are above 70%.

Our experiments illustrate how the DTD attributes can beused to find “semantic structures” in a dataset such as FMD,for example by distinguishing between “knitted vs pleatedfabric”, “gauzy vs crystalline glass”, “veined vs frilly fo-liage” etc. To do so, FDM images for each material wereclustered based on the 47 attribute vectors using K-meansinto 3-5 clusters each. Examples of the most meaningfulclusters are shown in Fig. 8 along with the dominant at-tributes in each.

Notable fine-grained material distinctions include knit-ted vs pleated fabrics and frilly vs pleated & veined foliage

wrinkled

aluminium

bumpy

pitted

porous

flecked

brown bread

grooved

corduroy

knitted

woven

grooved

cork

marbled

flecked

cotton

bumpy

blotchy

cracker

veined

pitted

lettuce leaf

woven

braided

linen

pitted

bumpy

blotchy

porous

white bread

stratified

grooved

wood

knitted

fibrous

wool

Figure 5: Descriptions of materials from KTH-TIPS-2b dataset. These words are the most frequent top scoring textureattributes (from the list of 47 we proposed), when classifying the images from the KTH-TIPS-2b dataset.

which contain linear structures. In the latter case, veins of-ten have a radial pattern which is captured by the domi-nant spiralled attribute. The method distinguishes bumpystones such as pebbles from porous or pitted stones forzoomed / detailed views of stone blocks. Water is dividedinto swirly & spiralled images, which show the orientationof the waves, and bubbly, sprinkled images, which showsplashing drops. Glass is more challenging but some imagesare correctly identified as crystalline. Fig. 7 shows otherchallenging examples illustrating the variety of materialsand patterns that can be described by the DTD attributes.Metal is one of the hardest class to identify (Fig. 6), but at-tributes such as “interlaced” and “braided” are still correctlyrecognized in the third (jewelry) and last (metal wires) im-age.

4.3.2 Examples in the wild

As an additional application of our describable texture at-tributes we compute them on a large dataset of 10,000 wall-papers and bedding sets (about 5,000 for each of the twocategories) from houzz.com. The 47 attribute classi-fiers are learned as explained in Sect. 4.1 using the IFVSIFTrepresentation and them apply them to the 10,000 imagesto predict the strength of association of each attribute andimage. Classifiers scores are recalibrated on a subset ofthe target data and converted to probabilities using Platt’smethod [28], for each individual attribute. Fig. 11 and

0

10

20

30

40

50

60

70

80

25.0

9 m

etal

39.8

6 le

athe

r

47.3

9 g

lass

49.4

2 fa

bric

52.5

4 p

last

ic

52.9

4 p

aper

60.5

8 s

tone

71.1

5 w

ood

74.2

5 w

ater

80.0

6 fo

liage

DTD features with RBF kernel on FMD (mAP: 55.33%)

Figure 6: Per class AP results on FMD dataset using DTDclassification scores as features.

Fig. 12 shows some example attribute predictions (exclud-ing images used for calibrating the scores), for the best scor-ing 3-4 images for each category – by top attribute. Weshow for each images the top three attributes – the top twobeing very accurate, while the third is correct in about halfof the cases. Please note that each score is calibrated on aper attribute basis, to the scores do not add up to 1.

5. Summary

We introduced a large dataset of 5,640 images collected“in the wild” jointly labeled with 47 describable texture at-tributes and used it to study the problem of extracting se-mantic properties of textures and patterns, addressing real-world human-centric applications. Looking for the best rep-resentation to recognize such describable attributes in natu-ral images, we have ported IFV, an object recognition rep-resentation, to the texture domain. Not only IFV works bestin recognizing describable attributes, but it also outperformsspecialized texture representation on a number of challeng-ing material recognition benchmarks. We have shown thatthe describable attributes, while not being designed to doso, are good predictors of materials as well, and that, whencombined with IFV, significantly outperform the state-of-the-art on the FMD and KTH-TIPS recognition tasks.

References[1] M. Amadasun and R. King. Textural features corresponding to tex-

tural properties. Systems, Man, and Cybernetics, 19(5), 1989.[2] R. Bajcsy. Computer description of textured surfaces. In IJCAI,

IJCAI. Morgan Kaufmann Publishers Inc., 1973.[3] T. Berg, A. Berg, and J. Shih. Automatic attribute discovery and

characterization from noisy web data. ECCV, 2010.[4] B. Berlin and P. Kay. Basic color terms: Their universality and evo-

lution. Univ of California Press, 1991.[5] N. Bhushan, A. Rao, and G. Lohse. The texture lexicon: Understand-

ing the categorization of visual texture terms and their relationship totexture images. Cognitive Science, 21(2):219–246, 1997.

[6] L. Bourdev, S. Maji, and J. Malik. Describing people: A poselet-based approach to attribute classification. In ICCV, 2011.

[7] B. Caputo, E. Hayman, and P. Mallikarjuna. Class-specific materialcategorisation. In ICCV, 2005.

[8] G. Csurka, C. R. Dance, L. Dan, J. Willamowski, and C. Bray. Visualcategorization with bags of keypoints. In Proc. ECCV Workshop onStat. Learn. in Comp. Vision, 2004.

[9] K. J. Dana, B. van Ginneken, S. K. Nayar, and J. J. Koenderink.Reflectance and texture of real world surfaces. ACM Transactions onGraphics, 18(1):1–34, 1999.

[10] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Ima-geNet: A Large-Scale Hierarchical Image Database. In CVPR, 2009.

chequered

gauzy

smeared

knitted

lacelike

porous

interlaced

bubbly

wrinkled

interlaced

spiralled

veined

striped

crystalline

braided

braided

smeared

spiralled

Figure 7: Challenging or difficult images which were correctly characterized by our DTD classifier.

[11] A. Farhadi, I. Endres, D. Hoiem, and D. Forsyth. Describing objectsby their attributes. In CVPR, pages 1778–1785. IEEE, 2009.

[12] V. Ferrari and A. Zisserman. Learning visual attributes. In NIPS,2007.

[13] J. M. Geusebroek, A. W. M. Smeulders, and J. van de Weijer. Fastanisotropic gauss filtering. IEEE Transactions on Image Processing,12(8):938–943, 2003.

[14] E. Hayman, B. Caputo, M. Fritz, and J.-O. Eklundh. On the signif-icance of real-world conditions for material classification. ECCV,2004.

[15] H. Jegou, M. Douze, C. Schmid, and P. Perez. Aggregating localdescriptors into a compact image representation. In Proc. CVPR,2010.

[16] B. Julesz. Textons, the elements of texture perception, and their in-teractions. Nature, 290(5802):91–97, march 1981.

[17] N. Kumar, A. Berg, P. Belhumeur, and S. Nayar. Describable visualattributes for face verification and image search. PAMI, 33(10):1962–1977, 2011.

[18] S. Lazebnik, C. Schmid, and J. Ponce. A sparse texture representa-tion using local affine regions. PAMI, 28(8):2169–2178, 2005.

[19] T. Leung and J. Malik. Representing and recognizing the visual ap-pearance of materials using three-dimensional textons. InternationalJournal of Computer Vision, 43(1):29–44, 2001.

[20] D. G. Lowe. Object recognition from local scale-invariant features.In Proc. ICCV, 1999.

[21] J. Malik and P. Perona. Preattentive texture discrimination with earlyvision mechanisms. JOSA A, 7(5), 1990.

[22] T. Matthews, M. S. Nixon, and M. Niranjan. Enriching texture anal-ysis with semantic data. In CVPR, June 2013.

[23] T. Ojala, M. Pietikainen, and T. Maenpaa. Multiresolution gray-scaleand rotation invariant texture classification with local binary patterns.PAMI, 24(7):971–987, 2002.

[24] G. Oxholm, P. Bariya, and K. Nishino. The scale of geometric tex-ture. In European Conference on Computer Vision, pages 58–71.Springer Berlin/Heidelberg, 2012.

[25] G. Patterson and J. Hays. Sun attribute database: Discovering, anno-

tating, and recognizing scene attributes. In CVPR, 2012.[26] F. Perronnin and C. R. Dance. Fisher kernels on visual vocabularies

for image categorization. In CVPR, 2007.[27] F. Perronnin, J. Sanchez, and T. Mensink. Improving the Fisher ker-

nel for large-scale image classification. In Proc. ECCV, 2010.[28] J. C. Platt. Probabilistic outputs for support vector machines

and comparisons to regularized likelihood methods. In A. Smola,P. Bartlett, B. Scholkopf, and D. Schuurmans, editors, Advances inLarge Margin Classifiers. Cambridge, 2000.

[29] L. Sharan, C. Liu, R. Rosenholtz, and E. H. Adelson. Recognizingmaterials using perceptually inspired features. International Journalof Computer Vision, 103(3):348–371, 2013.

[30] L. Sharan, R. Rosenholtz, and E. H. Adelson. Material perceprion:What can you see in a brief glance? Journal of Vision, 9:784(8),2009.

[31] G. Sharma, S. ul Hussain, and F. Jurie. Local higher-order statistics(lhs) for texture categorization and facial analysis. In Proc. ECCV.2012.

[32] L. Sifre and S. Mallat. Rotation, scaling and deformation invariantscattering for texture discrimination. In CVPR, June 2013.

[33] H. Tamura, S. Mori, and T. Yamawaki. Textural features correspond-ing to visual perception. Systems, Man and Cybernetics, IEEE Trans-actions on, 8(6):460 –473, june 1978.

[34] R. Timofte and L. Van Gool. A training-free classification frameworkfor textures, writers, and materials. In BMVC, Sept. 2012.

[35] M. Varma and A. Zisserman. Texture classification: Are filter banksnecessary? In CVPR, volume 2, pages II–691. IEEE, 2003.

[36] M. Varma and A. Zisserman. A statistical approach to texture classi-fication from single images. IJCV, 62(1):61–81, 2005.

[37] A. Vedaldi and A. Zisserman. Efficient additive kernels via explicitfeature maps. In CVPR, 2010.

[38] P. Welinder and P. Perona. Online crowdsourcing: rating annotatorsand obtaining cost-effective labels. In CVPR, 2010.

[39] Y. Xu, H. Ji, and C. Fermuller. Viewpoint invariant texture descrip-tion using fractal analysis. IJCV, 83(1):85–100, jun 2009.

knitted

meshed

braided

knitted

wrinkled

porous

lacelike

porous

frilly

fibrous

pitted

knitted

gauzy

spiralled

braided

lacelike

grid

blotchy

spiralled

sprinkled

knitted

knitted

stained

braided

knitted

lacelike

porous

knitted

sprinkled

braided

bumpy

scaly

knitted

porous

spiralled

dotted

crosshatched

stained

interlaced

bumpy

cobwebbed

smeared

fabric (knitted)

pleated

striped

polka−dotted

lacelike

blotchy

cracked

gauzy

striped

braided

pleated

stained

veined

grooved

crosshatched

pleated

stained

gauzy

veined

striped

gauzy

stained

gauzy

waffled

pleated

gauzy

smeared

blotchy

gauzy

pitted

spiralled

smeared

pleated

swirly

gauzy

flecked

studded

gauzy

braided

veined

crosshatched

gauzy

sprinkled

fabric (pleated)

cobwebbed

meshed

swirly

blotchy

gauzy

freckled

fibrous

gauzy

matted

chequered

gauzy

smeared

bubbly

stratified

bumpy

braided

spiralled

wrinkled

bubbly

gauzy

dotted

porous

banded

veined

glass (bubbly, gauzy)

crystalline

braided

bumpy

crystalline

sprinkled

bubbly

crystalline

bubbly

sprinkled

crystalline

pitted

freckled

frilly

bubbly

crystalline

crystalline

honeycombed

bubbly

glass (crystalline, bubbly)

Figure 8: Example meaningful clusters of FMD categories, obtained using K-means on DTD classification scores. Showingresults for fabric and glass – overlayed, we list the most frequently identified attributes. On each image, we show the top 3scoring texture words.

veined

spiralled

gauzy

veined

spiralled

wrinkled

veined

blotchy

chequered

pleated

striped

spiralled

veined

grooved

lined

striped

waffled

spiralled

pleated

spiralled

honeycombed

veined

braided

meshed

veined

spiralled

bumpy

scaly

frilly

striped

braided

marbled

veined

pleated

zigzagged

spiralled

braided

swirly

veined

veined

grid

swirly

foliage (pleated, spiralled, veined)

veined

wrinkled

frilly

frilly

freckled

honeycombed

frilly

sprinkled

blotchy

frilly

freckled

sprinkled

crystalline

waffled

freckled

gauzy

sprinkled

pleated

striped

chequered

crystalline

honeycombed

frilly

swirly

blotchy

frilly

sprinkled

sprinkled

frilly

spiralled

spiralled

stained

flecked

wrinkled

frilly

swirly

sprinkled

freckled

pleated

stained

sprinkled

cobwebbed

foliage (frilly, sprinkled)

pleated

sprinkled

frilly

pleated

spiralled

cobwebbed

frilly

bumpy

porous

crystalline

potholed

blotchy

wrinkled

bumpy

blotchy

frilly

pleated

blotchy

wrinkled

freckled

gauzy

gauzy

bubbly

pleated

frilly

pleated

interlaced

sprinkled

frilly

pleated

bubbly

frilly

blotchy

wrinkled

gauzy

freckled

gauzy

braided

crystalline

honeycombed

blotchy

pleated

paper (wrinkled, pleated)

cracked

veined

crosshatched

stratified

paisley

wrinkled

grooved

veined

pleated

wrinkled

woven

veined

gauzy

pleated

spiralled

cracked

stratified

blotchy

gauzy

interlaced

paisley

crosshatched

stratified

cracked

interlaced

grooved

crosshatched

stratified

paisley

veined

cracked

veined

pleated

swirly

bumpy

studded

interlaced

fibrous

wrinkled

bumpy

gauzy

lacelike

wood (cracked, veined, interlaced)

Figure 9: Continued from Fig. 8. Displaying results on foliage, paper and wood.

bumpy

braided

woven

freckled

bumpy

waffled

bumpy

scaly

frilly

sprinkled

freckled

wrinkled

bumpy

blotchy

zigzagged

sprinkled

blotchy

waffled

sprinkled

smeared

spiralled

flecked

bumpy

braided

braided

bumpy

cracked

bumpy

freckled

braided

sprinkled

crystalline

bubbly

bumpy

braided

dotted

flecked

braided

frilly

sprinkled

bumpy

veined

stone (bumpy)

potholed

porous

scaly

matted

crosshatched

blotchy

potholed

pitted

grid

porous

matted

bubbly

matted

zigzagged

crosshatched

sprinkled

waffled

blotchy

waffled

knitted

sprinkled

pitted

potholed

knitted

stained

potholed

crosshatched

porous

braided

flecked

pitted

smeared

chequered

porous

fibrous

potholed

porous

bumpy

stained

sprinkled

porous

matted

stone (porous, pitted, flecked)

smeared

bubbly

pleated

smeared

swirly

pleated

smeared

braided

bubbly

smeared

pitted

swirly

bubbly

meshed

smeared

smeared

swirly

fibrous

smeared

sprinkled

bubbly

bubbly

blotchy

sprinkled

smeared

wrinkled

meshed

stained

frilly

bubbly

smeared

bubbly

sprinkled

sprinkled

bubbly

smeared

stained

frilly

freckled

crystalline

bubbly

bumpy

water (bubbly, smeared)

smeared

swirly

spiralled

pleated

paisley

bumpy

fibrous

striped

matted

veined

cracked

spiralled

swirly

braided

smeared

braided

swirly

lined

swirly

pleated

smeared

smeared

spiralled

pleated

swirly

lined

spiralled

gauzy

stratified

waffled

potholed

stratified

smeared

waffled

stratified

pleated

cobwebbed

honeycombed

pitted

bubbly

crosshatched

swirly

water (swirly, spiralled)

Figure 10: Continued from Fig. 9 Subcategories for stone and water images.

zigzagged (1.00)

banded (0.15)

lined (0.14)

zigzagged (1.00)

lined (0.17)

marbled (0.17)

polka−dotted (1.00)

banded (0.24)

lined (0.12)

zigzagged (1.00)

marbled (0.24)

banded (0.22)

zigzagged (1.00)

honeycombed (0.12)

marbled (0.10)

woven (1.00)

grooved (0.92)

polka−dotted (0.45)

lined (1.00)

grooved (0.80)

zigzagged (0.15)

lined (1.00)

grooved (0.59)

zigzagged (0.17)

polka−dotted (1.00)

banded (0.16)

honeycombed (0.13)

grooved (1.00)

stratified (0.73)

veined (0.41)

grooved (1.00)

veined (0.70)

porous (0.16)

lined (1.00)

grooved (0.52)

pleated (0.12)

grooved (1.00)

zigzagged (0.35)

lined (0.20)

banded (1.00)

grooved (0.30)

bumpy (0.16)

lined (1.00)

marbled (0.19)

grooved (0.17)

banded (1.00)

grooved (0.24)

braided (0.18)

sprinkled (1.00)

pitted (0.65)

flecked (0.50)

banded (1.00)

spiralled (0.17)

zigzagged (0.13)

grooved (1.00)

zigzagged (0.29)

stratified (0.20)

lacelike (1.00)

porous (0.96)

woven (0.44)

perforated (1.00)

grid (0.59)

pitted (0.59)

woven (1.00)

grooved (0.78)

lined (0.46)

woven (1.00)

grid (0.57)

meshed (0.44)

swirly (1.00)

spiralled (0.63)

lacelike (0.41)

flecked (1.00)

blotchy (0.66)

woven (0.63)

woven (1.00)

meshed (0.73)

braided (0.54)

interlaced (1.00)

crosshatched (0.80)

veined (0.57)

swirly (1.00)

interlaced (0.29)

bubbly (0.19)

veined (1.00)

lined (0.67)

crosshatched (0.64)

flecked (1.00)

porous (0.31)

lacelike (0.31)

interlaced (1.00)

veined (0.59)

grid (0.44)

meshed (1.00)

grid (0.69)

polka−dotted (0.25)

stratified (1.00)

lined (0.69)

grooved (0.62)

interlaced (1.00)

grid (0.68)

meshed (0.40)

dotted (1.00)

polka−dotted (0.72)

paisley (0.38)

cobwebbed (1.00)

veined (0.91)

cracked (0.74)

stratified (1.00)

bumpy (0.30)

pleated (0.30)

banded (1.00)

braided (0.15)

zigzagged (0.14)

stratified (1.00)

cracked (0.92)

sprinkled (0.62)

dotted (1.00)

lacelike (0.69)

honeycombed (0.57)

perforated (1.00)

sprinkled (0.50)

paisley (0.50)

pleated (1.00)

smeared (0.52)

zigzagged (0.32)

lacelike (1.00)

porous (0.48)

bubbly (0.43)

blotchy (1.00)

dotted (0.32)

bubbly (0.27)

veined (1.00)

meshed (0.65)

zigzagged (0.64)

meshed (1.00)

grooved (0.41)

grid (0.38)

grid (1.00)

meshed (0.32)

sprinkled (0.28)

stained (1.00)

blotchy (0.56)

woven (0.54)

lacelike (1.00)

stained (0.90)

bubbly (0.26)

stratified (1.00)

cracked (0.87)

grooved (0.31)

interlaced (1.00)

veined (0.86)

flecked (0.48)

swirly (1.00)

paisley (0.48)

perforated (0.45)

crosshatched (1.00)

woven (0.62)

flecked (0.30)

blotchy (0.99)

marbled (0.49)

porous (0.29)

swirly (0.99)

paisley (0.80)

bubbly (0.76)

gauzy (0.99)

spiralled (0.48)

woven (0.21)

braided (0.99)

gauzy (0.65)

spiralled (0.33)

sprinkled (0.99)

perforated (0.37)

swirly (0.35)

meshed (0.99)

woven (0.63)

zigzagged (0.41)

crosshatched (0.99)

veined (0.48)

lined (0.30)

Figure 11: Example wallpaper images from an online catalog (houzz.com).

chequered (1.00)

sprinkled (0.84)

gauzy (0.54)

honeycombed (1.00)

grid (0.49)

swirly (0.23)

grooved (1.00)

banded (0.31)

swirly (0.24)

pleated (1.00)

sprinkled (0.69)

spiralled (0.27)

interlaced (1.00)

honeycombed (0.25)

meshed (0.20)

chequered (1.00)

woven (0.61)

grid (0.40)

chequered (1.00)

wrinkled (0.98)

interlaced (0.85)

interlaced (1.00)

spiralled (0.18)

honeycombed (0.17)

gauzy (1.00)

studded (0.84)

crosshatched (0.49)

chequered (1.00)

cobwebbed (0.40)

gauzy (0.37)

grooved (1.00)

gauzy (0.30)

cobwebbed (0.24)

zigzagged (1.00)

swirly (0.91)

veined (0.37)

gauzy (1.00)

pleated (0.69)

frilly (0.65)

woven (1.00)

interlaced (0.45)

lacelike (0.43)

interlaced (1.00)

spiralled (0.62)

swirly (0.61)

grooved (1.00)

veined (0.89)

meshed (0.32)

smeared (1.00)

bumpy (0.16)

paisley (0.15)

grooved (0.99)

potholed (0.27)

braided (0.18)

meshed (0.99)

grid (0.89)

cobwebbed (0.84)

banded (0.99)

blotchy (0.22)

grid (0.14)

frilly (0.99)

wrinkled (0.68)

pleated (0.22)

braided (0.99)

bumpy (0.44)

veined (0.33)

banded (0.99)

braided (0.64)

spiralled (0.60)

pleated (0.99)

honeycombed (0.81)

smeared (0.58)

interlaced (0.99)

grooved (0.73)

spiralled (0.37)

swirly (0.99)

studded (0.55)

frilly (0.37)

gauzy (0.99)

interlaced (0.43)

studded (0.41)

crosshatched (0.99)

studded (0.54)

lacelike (0.44)

swirly (0.99)

veined (0.81)

honeycombed (0.54)

smeared (0.99)

honeycombed (0.58)

potholed (0.30)

wrinkled (0.99)

honeycombed (0.76)

cracked (0.54)

gauzy (0.99)

blotchy (0.92)

grid (0.43)

pleated (0.99)

braided (0.47)

spiralled (0.37)

cobwebbed (0.99)

swirly (0.53)

pleated (0.47)

zigzagged (0.99)

interlaced (0.19)

swirly (0.12)

pleated (0.99)

chequered (0.72)

wrinkled (0.44)

spiralled (0.99)

fibrous (0.87)

veined (0.87)

paisley (0.99)

crosshatched (0.48)

lacelike (0.36)

cobwebbed (0.99)

meshed (0.45)

sprinkled (0.32)

spiralled (0.99)

cobwebbed (0.65)

porous (0.61)

cobwebbed (0.99)

chequered (0.62)

sprinkled (0.46)

frilly (0.99)

blotchy (0.91)

swirly (0.35)

cobwebbed (0.99)

chequered (0.73)

grid (0.28)

honeycombed (0.98)

blotchy (0.49)

stratified (0.45)

banded (0.98)

frilly (0.59)

blotchy (0.19)

woven (0.98)

braided (0.73)

spiralled (0.39)

frilly (0.98)

bubbly (0.58)

paisley (0.49)

sprinkled (0.98)

frilly (0.69)

bubbly (0.63)

swirly (0.98)

frilly (0.84)

interlaced (0.26)

blotchy (0.98)

sprinkled (0.56)

meshed (0.34)

smeared (0.98)

pleated (0.69)

sprinkled (0.68)

zigzagged (0.98)

meshed (0.36)

pleated (0.26)

woven (0.98)

meshed (0.46)

cobwebbed (0.28)

braided (0.98)

crosshatched (0.92)

meshed (0.74)

sprinkled (0.98)

crosshatched (0.85)

gauzy (0.43)

smeared (0.98)

bubbly (0.88)

veined (0.78)

spiralled (0.98)

bumpy (0.46)

grooved (0.28)

braided (0.98)

gauzy (0.87)

spiralled (0.46)

crosshatched (0.98)

grid (0.75)

gauzy (0.56)

zigzagged (0.98)

lined (0.14)

frilly (0.08)

Figure 12: Example bedding sets from an online catalog (houzz.com).