photo-quality evaluation based on computational aesthetics

16
Photo-Quality Evaluation based on Computational Aesthetics: Review of Feature Extraction Techniques Dimitris Spathis * Department of Informatics, Aristotle University of Thessaloniki, Greece Abstract Researchers try to model the aesthetic quality of photographs into low and high- level features, drawing inspiration from art theory, psychology and marketing. We attempt to describe every feature extraction measure employed in the above process. The contribution of this literature review is the taxonomy of each feature by its implementation complexity, considering real-world applications and integration in mobile apps and digital cameras. Also, we discuss the machine learning results along with some unexplored research areas as future work. Keywords: Image Processing, Feature Extraction, Photograph Quality Assessment 1. Introduction Quality assessment of photographs is not a new issue arising with digital cameras. Birkhoff, back in 1933 proposed that the aesthetic appeal of objects relates to the ratio of order and complexity in images. The hardness in this task is to define order and complexity [4]. With the advent of digital cameras and smartphones, people capture more photographs than they can consume. Nowadays, social media provide an adequate filter of our social graph, display- ing to us photographs and posts by our friends or friends of friends, based on popularity metrics. What if we could see photographs based on our taste, pre- vious likes, or general aesthetic criteria? In the field of information retrieval, the content based image retrieval (CBIR) systems apply computer vision tech- niques to the image retrieval problem, by searching in large databases. In this context, we do not search for metadata or documents, but we decompose the image on its pixels, attempting to draw inferences between its content and its aesthetic value. This relatively recent research area is termed as Computational * Semester project literature review for the MSc course Computer Vision by Ioannis Pitas. Email address: [email protected] (Dimitris Spathis) arXiv:1612.06259v1 [cs.CV] 19 Dec 2016

Upload: lenhi

Post on 13-Feb-2017

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Photo-Quality Evaluation based on Computational Aesthetics

Photo-Quality Evaluation based onComputational Aesthetics: Review of Feature

Extraction Techniques

Dimitris Spathis∗

Department of Informatics, Aristotle University of Thessaloniki, Greece

Abstract

Researchers try to model the aesthetic quality of photographs into low and high-level features, drawing inspiration from art theory, psychology and marketing.We attempt to describe every feature extraction measure employed in the aboveprocess. The contribution of this literature review is the taxonomy of eachfeature by its implementation complexity, considering real-world applicationsand integration in mobile apps and digital cameras. Also, we discuss the machinelearning results along with some unexplored research areas as future work.

Keywords: Image Processing, Feature Extraction, Photograph QualityAssessment

1. Introduction

Quality assessment of photographs is not a new issue arising with digitalcameras. Birkhoff, back in 1933 proposed that the aesthetic appeal of objectsrelates to the ratio of order and complexity in images. The hardness in thistask is to define order and complexity [4]. With the advent of digital camerasand smartphones, people capture more photographs than they can consume.Nowadays, social media provide an adequate filter of our social graph, display-ing to us photographs and posts by our friends or friends of friends, based onpopularity metrics. What if we could see photographs based on our taste, pre-vious likes, or general aesthetic criteria? In the field of information retrieval,the content based image retrieval (CBIR) systems apply computer vision tech-niques to the image retrieval problem, by searching in large databases. In thiscontext, we do not search for metadata or documents, but we decompose theimage on its pixels, attempting to draw inferences between its content and itsaesthetic value. This relatively recent research area is termed as Computational

∗Semester project literature review for the MSc course Computer Vision by Ioannis Pitas.Email address: [email protected] (Dimitris Spathis)

arX

iv:1

612.

0625

9v1

[cs

.CV

] 1

9 D

ec 2

016

Page 2: Photo-Quality Evaluation based on Computational Aesthetics

Aesthetics. Researchers in this area leverage techniques from image processingand computer vision, combining methodologies from psychology and art theory.[32,22]

The field of Computational Aesthetics gradually attracts interest in the sci-entific community. Scopus data (figure 1) reveal the steady growth in literatureregarding this area. In 2014, 89 papers were published in international confer-ences and journals. If we dig more on Scopus data, we find that the majorityof publications comes from Asia (National University of Singapore, UniversitiTenaga Nasional and Zhejiang University) and North America (Simon FraserUniversity, Carnegie Mellon University, and Georgia Institute of Technology).

1990

1991

1992

1993

1994

1995

1996

1997

1998

1999

2000

2001

2002

2003

2004

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

0

50

Pap

ers

on

Sco

pu

s

Figure 1: Papers on Scopus containing ”Computational Aesthetic[s]” in abstract, title, orkeywords

Researchers involved in aesthetic assessment of photographs try to corre-late low or high-level attributes of an image with its quality evaluation. Thisprocess, understandably contains subjectivity in gathering ground truth data.The solution proposed is to combine these extracted features with crowdsourcedratings of photographs, ideally of big populations [30].

But, before we can design features to assess quality, we should decide on theperceptual criteria that we use when we judge photographs. Literature groupsthree factors on judging the quality of photographs: simplicity, realism andbasic photography techniques [15].

Simplicity. One distinctive aspect of professional shots to everyday snap-shots is that, pro photos are simple and easy to separate the foreground fromthe background. The tricks used by professionals in order to distinguish thebackground from the subject is by widening the lens aperture, increasing thenatural color contrast of the subject and the lighting contrast.

Realism. Our snapshots usually look ordinary and quite common, while theprofessional shots look surreal. To achieve this effect, professionals are awareof the lighting conditions of dusk or dawn, choosing to shoot at hours that thesunlights are indirect. Also, they use hardware filters to make the sky bluer(ND filters), adjusting the color palette and saturation levels. Hardware ingeneral, is more expensive and customized than regular smartphones, allowing

2

Page 3: Photo-Quality Evaluation based on Computational Aesthetics

to tweak the aperture, shutter speed, lens optical zoom and lots of other options.By leveraging the above techniques, professionals tend to shoot more unusualobjects and situations.

Basic photography techniques. It is very rare for a professional phototo be entirely blurry. Blurriness is often the result of camera shake or lowquality lens. Background blurriness, though, could be an aspect of professionaldeliberate result, as described above. In addition, professionals shoot usuallyhigher contrast photos than point-and-shoot snapshot users.

Given the above grouping, researchers try to model these factors into featuresthat can be extracted from images. We will describe those features in section2. The rest of the paper is structured as follows: in section 3 we describe theaesthetic benchmark datasets used in computational evaluation, in section 4 weasses the machine learning frameworks of predicting aesthetic value and, finally,in section 5 we discuss the limitations and future work.

2. Feature Extraction

In the last fifteen years there have been significant contributions to the fieldof image representation and feature extraction towards semantic understanding.[14] Aesthetics and emotional value are based on semantics, so it is not surpris-ing that they draw inspiration from the above fields. In the literature we noticea spectrum of low level (edges, textures, color histograms etc.) and high level(rule of thirds, symmetry, saliency, face recognition etc.) features combined.We group all the available features mentioned in literature as follows: color,texture, composition, content . Also, we attempt a qualitative labelling ofeach feature according to its implementation complexity with current image pro-cessing tools and frameworks. Note that the implementation complexity mightoverlap with computation complexity particularly in data intensive extractiontechniques, such as facial recognition.

2.1. Color

HSL and HSV color spaces. Hue, saturation and lightness (HSL) andhue, saturation and value (HSV) are the two most common cylindrical-coordinaterepresentations of points in an RGB color model. They are used in literaturedue to the better representation of color than RGB. Features often extract theaverage value of each channel in the whole image, or segments of it.

White Balance. Professional cameras have to take into account the colortemperature of the light source. On the other hand, cheap cameras tend toshoot blue-ish photographs due to low-quality lenses. We estimate the averagecolor temperature distribution.

Contrast. One of the most important aspects of a photograph is the dif-ference in colour luminance that makes an object or thing distinguishable. Wecreate a multi-scale contrast map from the brightness histogram of the photoand count average and median values.

Pixel Intensity. Too much exposure often creates lower quality pictures.Those that are too dark are often also not appealing. Thus light exposure

3

Page 4: Photo-Quality Evaluation based on Computational Aesthetics

can often be a good discriminant between high and low quality photographs.However, an over/under-exposed photograph under certain scenarios may yieldvery original and beautiful shots. [8] We use the average pixel intensity toestimate the use of light, where Iv is the Value channel of HSV from each rowX, column Y of the image.

f =1

XY

X−1∑x=0

Y−1∑y=0

IV (x, y) (1)

Pleasure, arousal, dominance. Emotional coordinates based on satura-tion and brightness are estimated by [32] from their psychology experiments.This relationship between the saturation and brightness is modelled as follows:

Pleasure = 0.69V + 0.22S (2)

Arousal = 0.31V + 0.60S (3)

Dominance = −0.76V + 0.32S (4)

where V, S are the Value and Saturation metrics of the HSV color model.Color Templates. There are several metrics proposed by [18] in order to

estimate the distance between the hue distribution and a certain color template.In particular, Munsel color system provides efficient regions in a color wheelaccording to its position. In (fig 2) we see the hue distribution models, wherethe gray color indicates the efficient regions that result in harmony.

Figure 2: Munsel color wheel templates. Gray areas are aesthetically efficient.

[35] proposed an improved metric by approximating the color distribution ofhue histogram by identifying its location of peak value. The distance between

4

Page 5: Photo-Quality Evaluation based on Computational Aesthetics

the hue histogram distribution and the most matching Munsel color templateis expressed as:

Dk =

255∑0

dist(i, Rk)S(i) (5)

dist(H(i), Rk =

{0, if i in Rk

H(i)arc(I,Rk), otherwise(6)

where k=1...7 is the type of color template (see above fig 2), dist(i,Rk) is thedistance between hue value i and the gray region Rk of template Tk, H(i) standsfor the occurrence of hue value i in the histogram, dist(H(i), Rk) is zero when iis in the gray regions Rk arc=length distance between i and the nearest borderof gray regions and S(i) is the average saturation of hue value i and acts asweight on colors with low saturation.

Then, we should find which template fits better our test image. Ideally thisis the color template with the minimum distance. However some templatescontain more gray than others, so there must be some bias. We overcome thisby measuring the fitness of color templates

Rk = nk/pk||Is|| (7)

where nk is the total number of pixels in gray regions, Is the image size and pkthe inverse proportion of gray region size in the template. Combining the ratioRk and distance Dk, we measure the similarity between hue histogram and colortemplates:

Sk = Dk exp(−σRk) (8)

Color moment. A group of standard measures which characterise colordistribution in an image in the same way that central moments describe a prob-ability distribution uniquely. We measure mean, standard deviation, skewnessand kurtosis, for each HSV channel.

Colourfulness. [8] proposed a robust method to find the relative color dis-tribution, distinguishing colorful images from monochromatic and low contrastones. They employed the Earth Movers Distance (EMD) which is a measure ofsimilarity between any two weighted distributions. Dividing the colour spaceinto n cubic blocks with four equal partitions along each dimension, they takeeach such cube as a sample point. Distribution D1 is generated as the colordistribution of a hypothetical image such that for each of n sample points, thefrequency is 1/n. Distribution D2 is computed from the test image, by findingthe frequency of occurrence of color within each of the n cubes. But, EMDrequires the pairwise distance between the two under-comparison points, so wecount the pairwise Euclidean distances between the geometric centers C of eachcube. Finally the colourfulness measure is estimated as follows:

f = emd(D1, D2(d(a, b) | 0 ≤ a, b ≤ n− 1)) where d(a, b) = ||Ca− Cb||(9)

5

Page 6: Photo-Quality Evaluation based on Computational Aesthetics

Colour Names. We consider that every pixel of an image can be assignedto one of the following groups: black, blue, brown, green, gray, orange, pink,purple, red, white, yellow. The algorithm proposed by [33] mimics the wayhumans judge a whole photo of an image by its color. This measure let usdecide on the style of the photographer as well.

Feature Description Proposed by ImplementationHSL, HSV hue, saturation, lightness easyWhite Balance color temperature easyContrast multi-scale contrast map easyPixel Intensity use of light of the V channel Datta et al easyPleasure/Ar/Dom emotional coordinates Valdez et al easyColor Templates Munsell templates distance Li et al mediumColor Moment color statistics Datta et al easyColourfulness color diversity dstance Datta et al mediumColour Names amount of specific colors van de Weijer et al easyBags-of-color color harmony in local regions Nishiyama et al hardItten Contrasts warm, cold contrasts Itten mediumDark Channel detect haze area He et al. medium

Table 1: Color Features

Bags-of-color. [23] proposed that a photograph can be seen as a collec-tion of local regions with color variations that are relatively simple. Drawinginspiration from bags-of-words models used in NLP, they coined their term asbags-of-color-patterns. In particular, the proposed method:

1. samples local regions of a photo using grid sampling, distinguishing uni-form regions from regions around edges and corners. Segmentation isdone by mean shift, and the color boundaries detection by discriminantanalysis.

2. describes each local region by features based on color harmony in Munselcolor space.

3. quantizes these features by using visual words in codebooks, by using k-means clustering.

4. represents the photo as a histogram of quantized features

Itten Contrasts. A concept from art theory. [13] studied the usage of colorin art and he formalized contrast concepts as to combine colors that trigger anemotional effect in the observer. Itten proposed a spherical harmony modelwhich can be interpreted in many ways in order to extract harmonic colors, asopposing colors in rectangle or triangle. The features we measure is averagecontrast of brightness, contrast of saturation, contrast of hue, contrast of com-plements, contrast of warmth, harmony, hue count, hue spread, area of warm,

6

Page 7: Photo-Quality Evaluation based on Computational Aesthetics

area of cold, and the maximum of each. [22] provides a detailed estimationprocess.

Dark Channel. Haze effect was described by [11] as the dark channelmethod which identifies the ill-focused or dull color layout in many amateurphotos that suffer from an effect resembling a cloud of haze of the image. Theyproposed that at least one color channel has more than one pixel close to zero inthe haze-free area. [35] improved this feature by taking into account the reducedDepth of Field of professionals and the foreground - background change.

2.2. Texture

Tamura. A set of various features proposed by [31] based on psychologi-cal experiments. They are successfully used in affective image retrieval. Themeasure computes distances of notable spatial variations of grey levels. Themost common Tamura features used in literature are coarseness, contrast anddirectionality.

Edges. Edge detection is one of the most common image processing tasksfor semantic understanding. In our literature it is widely accepted and imple-mented in multiple forms, mainly in CANNY and SOBEL filters. In some cases[25], where it was used for architecture images aesthetics, a line-specific edgedetection filter is employed in order to estimate the optimal angles of build-ings. They measured vertical, horizontal and nondirectional edges. Generally,we expect the edges in professional photos to be clustered near the center of theimage.

Feature Description Proposed by ImplementationTamura coarseness contrast directionality Tamura et al easyEdges CANNY / SOBEL easySpatial Envelope Gabor filters Oliva et al hardWavelet Blurriness wavelet textures Datta et al mediumGLCO Matrix contr/corr/enrgy/hmg Machajdik et al easyVisual words interest points Bay et al hard

Table 2: Texture Features

Spatial Envelope. [24] proposed the GIST measure as a low-level scenedescriptor. A set of dimensions, that represent the high-level structure of ascene (naturalness, openness, roughness, expansion, ruggedness) is estimatedusing spectral information and coarse localization. In particular, the image issegmented in a 4x4 grid and a histogram of gradients is computed for eachregion, and color channel.

Wavelet Blurriness. [8] proposed that the use of texture is a skill inphotography. By measuring the Daubechies wavelet transform they estimatethe spatial smoothness. They performed a 3-level wavelet transform on allthree color channels of HSV.

7

Page 8: Photo-Quality Evaluation based on Computational Aesthetics

Gray-Level Co-Occurrence Matrix. A classic method for measuringtexture changes. We measure contrast, correlation, energy and homogeneity forHue, Saturation and Brightness channel.

Visual words. In order to capture spatial information of an image, we use apopular approach for exploiting local edges in images that creates visual words.The local edges correspond to change in intensity and are considered as interestpoints. By following the codebook approach as in Bags-of-color, the image canbe considered as a bag of visual words. In literature [25], the Speeded Up RobustFeatures (SURF) framework [2] is used, which is based on Haar wavelets.

2.3. Composition

Level of Detail. Images with detail generally trigger different emotionaleffects than minimal shots. To measure this feature,[22] counted the number ofregions after a waterfall segmentation. That way, we can distinguish betweencluttered images and more simpler ones.

Blur. [15] modelled the blurriness of a photo as the result of a Gaussiansmoothing filter applied to an otherwise sharp image, The challenge is to recoverthe smoothing parameter given only the blurred image. The image quality wouldbe inversely proportional to this parameter. We can estimate the maximumfrequency of the blurred image by taking its 2D Fourier transform and countthe number of frequencies whose power is greater than some threshold. Thefinal quality of our test image is estimated by the ratio of the Fourier set offrequencies by the image size.

Dynamic Lines. [13] noted that lines in images induce emotional trig-gers. Horizons are depicted by horizontal lines and communicate calmness andrelaxation, while vertical lines are clear and straightforward. Leaning lines,communicate dynamism or uncertainty. Line length and thickness overstate theabove effects. By using the Hough transform we can detect those patterns andclassify them as static or slant.

Shape Convexity. Humans tend to display different emotions when areshown rounded or convex objects. [8] We segment our image in patches andcompute their convex hull. For a segment to be considered a perfect convexshape, its segment by convex hull ratio must be 1. The segmentation processis the most critical part of the success of this algorithm. The final feature isthe fraction of the image covered by approximately convex-shaped homogeneousregions

Rule of Thirds. Photographers trying to emulate the golden ratio in theirshots, tend to shoot their subjects at one of the four intersections of the innerrectangle of the viewfinder. This implies that a large part of the interest pointoften lies on the periphery or inside the inner rectangle. [8] modelled this featureas:

f =9

XY

2X/3∑x=X/3

2Y/3∑y=Y/3

Ih(x, y) (10)

8

Page 9: Photo-Quality Evaluation based on Computational Aesthetics

with Ih being the Hue value of the HSV color mode. Respectively, we measurethe S and V.

Uniqueness and Familiarity. We humans learn to judge the aestheticsof pictures from the experience gathered by seeing other pictures. Our opinionsare often predefined by what we have seen in the past. When we see some-thing rare we perceive it very different than normal situations. We model thisuniqueness feature as the integrated region matching (IRM) image distance [18]The IRM distance computes image similarity by using color, texture and shapeinformation from automatically segmented regions, and performing a robustregion-based matching with other images. According to our datasets we cancompare unseen test photos with our aesthetic-extracted ground truth, in orderto decide if it is unique compared to them.

Size and Aspect Ratio. These simple features measure the width andheight of our image. The aspect ratio especially, is important for optimal aes-thetic cropping [3,5,17]

Feature Description Proposed by ImplementationLevel of Detail waterfall segmentation Machajdik et al easyBlur magnitude-based frequency Ke et al hardDynamic Lines static dynamic thick lines Itten easyShape Convexity object roundness Datta et al mediumRule of Thirds inner rectangle Datta et al easyUniqueness integrated region matching Li et al hardSize, Aspect Ratio width, height easyImage Complexity Shannon Entropy Romero et al easyProcessing Complexity Kolmogorov complx., Zurek entr. Machado et al mediumVisual weight ratio geometric context Hoiem et al. easyForeground position foreground and rule of 3ds Bhattacharya et al easySalient regions interest regions Wong et al hardGraphlets regions as graphs Zhang et al hardPyramid of HOG self-similarity complexity anisotropy Redies et al mediumSymmetry HOG Dalal et al easy

Table 3: Composition Features

Image and Processing Complexity. These features are inspired byBirkhoffs idea that aesthetic appeal of objects relates to the ratio of order andcomplexity. The optimal image is accomplished by an image that has a lowimage complexity and a low processing complexity, at the same time. [27,28,21]The idea states that compression error correlates with complexity. They employShannon’s entropy in order to measure the Image complexity. The Processingcomplexity is estimated via the Kolmogorov complexity.

Visual Weight Ratio. [12] proposed a method to recover the surface layoutfrom an outdoor image using geometric context. The scene is segmented into

9

Page 10: Photo-Quality Evaluation based on Computational Aesthetics

sky regions, ground regions, and vertical standing objects, using adaboost on avariety of low level image features. We take vertical standing objects as subjectareas. That way we distinguish between clear, cloudy and sunset skies. Theratios between the areas of these foreground-background rectangles should beclose to the golden ratio for a better appeal.

Foreground Position. [3] modelled the relative foreground position as thenormalized Euclidean distance between the foregrounds center of mass (visualattention center), to each of four symmetric stress points (rule of thirds inter-sections) in the image frame. This metric is ideal for single-subject scenes suchas animals or portraits but might me ineffective in landscapes or seaside viewwhich do not have a clear foreground. For these situations we choose the abovetechnique of Visual weight ratio.

Salient regions. Saliency of a subject is the quality by which it stands outcompared to its neighbors. [37] extract the salient regions from an image byutilizing a visual saliency model. We assume that the salient regions contain thephoto subject. We first find the salient locations in the original image, computethe saliency map, the segmented image, and end up with the salient mask basedon the main salient locations. Our target is to enhance those salient regions asforeground objects. [38] This saliency retargeting problem (Fig 3) is solved bySequential Quadratic Programming (SQP).

Figure 3: Saliency retargeting process: (a), (d) Original image and its saliency map.(b), (e) Globally-enhanced image and its saliency map. (c), (g) Image enhanced by saliencyretargeting and its saliency map. (f) Object segments, where Objects A and B are in decreasingorder of importance. Figure retrieved by [38].

Graphlets. There are usually many components within a photo. Amongthese components, a few spatially neighboring ones and their interactions cap-ture photo local aesthetics. Since a graph is a powerful tool to describe therelationships between objects, [39] used graphs to model the spatial interac-tions between image components. Their technique is to segment a photo into aset of atomic regions using unsupervised fuzzy clustering. Based on this, theyextract graphlets Graphlet is a small-sized connected graph defined as: G = (V,E) where V is a set of vertices representing locally distributed atomic regions andE is a set of edges, each of which connects pairwise spatially adjacent atomic

10

Page 11: Photo-Quality Evaluation based on Computational Aesthetics

regions. Then, the graphlets are projected onto a manifold and the authorssubsequently proposed an embedding algorithm.

Pyramid of Histograms of Orientation Gradients. [26] proposed amodel based on PHOG following a pyramid approach to calculate the HOG.HOG values are calculated based on the maximum gradient magnitudes in thecolor channels. Based on the new gradient image, we estimate the high self-similarity, moderate complexity and low anisotropy.

Symmetry. [30] measured the symmetry of an image based on the differenceof the Histogram of Oriented Gradients (HOG) [7] between the first half of theimage and its opposite right half.

2.4. Content

Faces. Recognition of active faces inside an image is still an open researchfield; the researchers of Computational Aesthetics borrow the advancements ofthis field in order to count the number of frontal faces, the relative size of thebiggest face as well as the shadow area in faces. [20] The implementation usedis usually the state of art algorithm proposed by [34].

Skin. Along with faces, skin color and recognition is important in orderto identify a photograph containing humans. We estimate the number of skinpixels and the relative amount of skin with respect to the size of faces. The keyconcept is to detect the color space of pink that corresponds to human skin [19].

3. Benchmark Datasets

The above features are extracted usually from ground truth aesthetic imagesfound in various dataset created for this task. Researchers test their techniqueswith the above benchmark datasets: DPChallenge, Photo.net, Flickr, Terragal-leria, ALIPR and AVA. [14]

DPChallenge1 allows users to participate and contest in theme-based pho-tography on diverse themes such as life and death, portraits, animals, geology,street photography. Peer rating on overall quality, on a 1- 10 scale, determinesthe contest winners. (16,509 images)

Photo.Net2 is a platform for photography enthusiasts to share and havetheir pictures peer rated on a 1 - 7 scale of aesthetics. The photography com-munity also provides discussion forums, reviews on photos and photographyproducts, and galleries for members and casual surfers. (20,278 images)

Flickr3 is one of the most popular online photo-sharing sites in the world.According to Flickr, the interestingness of a picture is dynamic and dependson a plethora of criteria including its photographer, who marks it as a favorite,comments, and tags given by the community.

1dpchallenge.com2photo.net3flickr.com

11

Page 12: Photo-Quality Evaluation based on Computational Aesthetics

Terragalleria4 displays travel photography of Quang-Tuan Luong (a sci-entist and a photographer), and is one of the finest resources for U.S. nationalpark photography on the Web. All photographs here have been taken by oneperson (unlike other benchmarks), but multiple users have rated them on overallquality on a 1 - 10 scale.

ALIPR5 is a Web-based image search and tagging system that also allowsusers to rate photographs along 10 different emotional categories such as sur-prising, amusing and adorable.

AVA6 is the biggest aesthetics dataset which contains a rich variety of meta-data including a large number of aesthetic scores for each image, semantic labelsfor over 60 categories as well as labels related to photographic style. (250,000images)

4. Machine Learning

Combining the above ground truth aesthetic datasets with the features ex-tracted of their images, researchers employ machine learning techniques in orderto predict the aesthetic value of an image. In this process, both classificationand regression is performed for different desired results. With classification wecan distinguish between an aesthetic photo or not. But this measure tends tobe kind of absolute in such a subjective task. However, aesthetics is a moreabstract quality and quantifying it requires a relatively larger numeric scale.Thus, regression techniques provides us with a scale of aesthetic score for eachphoto, so we are able to compare why a photo is considered of better qual-ity from another. SVM, SVR and classification and regression trees (CART)are the most usual methods used for classification and regression. In addition,forms of unsupervised learning like the K-means clustering are used for visualvocabulary generation and graph-based region segmentation,

The state of the art achieved on the above datasets varies according withthe implementation and the training dataset size, making it difficult to comparethe accuracy results of the proposed techniques. The best reported accuraciesin most cases is around 80% [8,35,39].

5. Discussion

The field of Computational Aesthetics seems promising considering the grow-ing interest from the research community. Researchers try to model the aestheticquality of photographs into low and high-level features, drawing inspirationfrom art theory, psychology and marketing. There are many quite imagina-tive applications of computational aesthetics such as coral reef evaluation image

4terragalleria.com5wang.ist.psu.edu/IMAGE6lucamarchesotti.com

12

Page 13: Photo-Quality Evaluation based on Computational Aesthetics

aesthetics [10], optimal text placement inside an image [18], better route sug-gestion in a city based on surroundings [25] or personality prediction [29] Someunderexplored areas proposed by this review are: personalization feedback inpredictions, non-photographic images, Black and White artistic consideration,and EXIF camera metadata..

6. References

References

[1] Battiato, S., Moltisanti, M., Ravi, F., Bruna, A. R., Naccari, F. (2013,February). Aesthetic scoring of digital portraits for consumer applications.In IST/SPIE Electronic Imaging (pp. 866008-866008). International Societyfor Optics and Photonics.

[2] Bay, H., Tuytelaars, T., Van Gool, L. (2006). Surf: Speeded up robustfeatures. In Computer visionECCV 2006 (pp. 404-417). Springer BerlinHeidelberg.

[3] Bhattacharya, S., Sukthankar, R., Shah, M. (2010, October). A frameworkfor photo-quality assessment and enhancement based on visual aesthetics.In Proceedings of the 18th ACM international conference on Multimedia(pp. 271-280). ACM.

[4] Birkhoff, G. D. (1933). Aesthetic measure. Cambridge, Mass

[5] Cheng, B., Ni, B., Yan, S., Tian, Q. (2010, October). Learning to photo-graph. In Proceedings of the 18th ACM international conference on Multi-media (pp. 291-300). ACM.

[6] Ciesielski, V., Barile, P., Trist, K. (2013). Finding image features associatedwith high aesthetic value by machine learning (pp. 47-58). Springer BerlinHeidelberg.

[7] Dalal, N., Triggs, B. (2005, June). Histograms of oriented gradients for hu-man detection. In Computer Vision and Pattern Recognition, 2005. CVPR2005. IEEE Computer Society Conference on (Vol. 1, pp. 886-893). IEEE.

[8] Datta, R., Joshi, D., Li, J., Wang, J. Z. (2006). Studying aesthetics inphotographic images using a computational approach. In Computer Vi-sionECCV 2006 (pp. 288-301). Springer Berlin Heidelberg.

[9] Dhar, S., Ordonez, V., Berg, T. L. (2011, June). High level describableattributes for predicting aesthetics and interestingness. In Computer Visionand Pattern Recognition (CVPR), 2011 IEEE Conference on (pp. 1657-1664). IEEE.

[10] Haas, A. F., Guibert, M., Foerschner, A., Calhoun, S., George, E., Hatay,M., ...Felts, B. (2015). Can we measure beauty? Computational evaluationof coral reef aesthetics. PeerJ, 3, e1390.

13

Page 14: Photo-Quality Evaluation based on Computational Aesthetics

[11] He, K., Sun, J., Tang, X. (2011). Single image haze removal using darkchannel prior. Pattern Analysis and Machine Intelligence, IEEE Transac-tions on, 33(12), 2341-2353.

[12] Hoiem, D., Efros, A. A., Hebert, M. (2007). Recovering surface layout froman image. International Journal of Computer Vision, 75(1), 151-172.

[13] J. Itten. The art of color : the subjective experience and objective rationaleof color. John Wiley, New York, 1973

[14] Joshi, D., Datta, R., Fedorovskaya, E., Luong, Q. T., Wang, J. Z., Li,J., Luo, J. (2011). Aesthetics and emotions in images. Signal ProcessingMagazine, IEEE, 28(5), 94-115.

[15] Ke, Y., Tang, X., Jing, F. (2006, June). The design of high-level featuresfor photo quality assessment. In Computer Vision and Pattern Recognition,2006 IEEE Computer Society Conference on (Vol. 1, pp. 419-426). IEEE.

[16] Lai, C. Y., Chen, P. H., Shih, S. W., Liu, Y., Hong, J. S. (2010). Compu-tational models and experimental investigations of effects of balance andsymmetry on the aesthetics of text-overlaid images. International Journalof Human-Computer Studies, 68(1), 41-56.

[17] Li, C., Loui, A. C., Chen, T. (2010, October). Towards aesthetics: a photoquality assessment and photo selection system. In Proceedings of the 18thACM international conference on Multimedia (pp. 827-830). ACM.

[18] Li, C., Chen, T. (2009). Aesthetic visual quality assessment of paint-ings.Selected Topics in Signal Processing, IEEE Journal of, 3(2), 236-252.

[19] Liensberger, C., Stttinger, J., Kampel, M. (2009, October). Color-basedand context-aware skin detection for online video annotation. In MMSP(pp. 1-6).

[20] Luo, W., Wang, X., Tang, X. (2011, November). Content-based photo qual-ity assessment. In Computer Vision (ICCV), 2011 IEEE International Con-ference on (pp. 2206-2213). IEEE.

[21] Machado, P., Cardoso, A. (1998). Computing aesthetics. In Advances inartificial intelligence (pp. 219-228). Springer Berlin Heidelberg.

[22] Machajdik, J., Hanbury, A. (2010, October). Affective image classificationusing features inspired by psychology and art theory. In Proceedings of theinternational conference on Multimedia (pp. 83-92). ACM.

[23] Nishiyama, M., Okabe, T., Sato, I., Sato, Y. (2011, June). Aesthetic qual-ity classification of photographs based on color harmony. In Computer Vi-sion and Pattern Recognition (CVPR), 2011 IEEE Conference (pp. 33-40).IEEE.

14

Page 15: Photo-Quality Evaluation based on Computational Aesthetics

[24] Oliva, A., Torralba, A. (2001). Modeling the shape of the scene: A holisticrepresentation of the spatial envelope. International journal of computervision,42(3), 145-175.

[25] Quercia, D., O’Hare, N. K., Cramer, H. (2014, February). Aesthetic capital:what makes london look beautiful, quiet, and happy?. In Proceedings ofthe 17th ACM conference on Computer supported cooperative work socialcomputing (pp. 945-955). ACM.

[26] Redies, C., Amirshahi, S. A., Koch, M., Denzler, J. (2012, October).PHOG-derived aesthetic measures applied to color photographs of art-works, natural scenes and objects. In Computer VisionECCV 2012. Work-shops and Demonstrations (pp. 522-531). Springer Berlin Heidelberg.

[27] Romero, J., Machado, P., Carballal, A., Osorio, O. (2011). Aesthetic clas-sification and sorting based on image compression. In Applications of evo-lutionary computation (pp. 394-403). Springer Berlin Heidelberg.

[28] Romero, J., Machado, P., Carballal, A., Santos, A. (2012). Using complex-ity estimates in aesthetic image classification. Journal of Mathematics andthe Arts, 6(2-3), 125-136.

[29] Segalin, C.; Perina, A.; Cristani, M.; Vinciarelli, A., (2016) The Pictureswe Like are our Image: Continuous Mapping of Favorite Pictures into Self-Assessed and Attributed Personality Traits, in Affective Computing, IEEETransactions on , vol.PP, no.99, pp.1-1

[30] Schifanella, R., Redi, M., Aiello, L. M. An Image is Worth More thana Thousand Favorites: Surfacing the Hidden Beauty of Flickr Pictures.In ICWSM’15: Proceedings of the 9th AAAI International Conference onWeblogs and Social Media. AAAI

[31] Tamura, H., Mori, S., Yamawaki, T. (1978). Textural features correspond-ing to visual perception. Systems, Man and Cybernetics, IEEE Transac-tions on,8(6), 460-473.

[32] Valdez, P., Mehrabian, A. (1994). Effects of color on emotions. Journal ofexperimental psychology: General, 123(4), 394.

[33] Van De Weijer, J., Schmid, C., Verbeek, J., Larlus, D. (2009). Learningcolor names for real-world applications. Image Processing, IEEE Transac-tions on,18(7), 1512-1523.

[34] Viola, P., Jones, M. J. (2004). Robust real-time face detection. Internationaljournal of computer vision, 57(2), 137-154.

[35] Wang, W., Cai, D., Wang, L., Huang, Q., Xu, X., Li, X. (2016). Synthesizedcomputational aesthetic evaluation of photos. Neurocomputing, 172, 244-252.

15

Page 16: Photo-Quality Evaluation based on Computational Aesthetics

[36] Wei-ning, W., Ying-lin, Y., Sheng-ming, J. (2006, October). Image retrievalby emotional semantics: A study of emotional space and feature extrac-tion. In Systems, Man and Cybernetics, 2006. SMC’06. IEEE InternationalConference on (Vol. 4, pp. 3534-3539). IEEE.

[37] Wong, L. K., Low, K. L. (2009, November). Saliency-enhanced image aes-thetics class prediction. In Image Processing (ICIP), 2009 16th IEEE In-ternational Conference on (pp. 997-1000). IEEE.

[38] Wong, L. K., Low, K. L. (2011, January). Saliency retargeting: An ap-proach to enhance image aesthetics. In Applications of Computer Vision(WACV), 2011 IEEE Workshop on (pp. 73-80). IEEE.

[39] Zhang, L., Gao, Y., Zimmermann, R., Tian, Q., Li, X. (2014). Fusion ofmultichannel local and global structural cues for photo aesthetics evalua-tion.Image Processing, IEEE Transactions on, 23(3), 1419-1429.

16