#trendingfashion · the instagram api gives access to tag data like counts and recent posts, media...

7
#TrendingFashion Grace Park Caltech ’16 [email protected] JD Co-Reyes Caltech ’16 [email protected] Ella Mathews Caltech ’15 [email protected] Aditya Karan Caltech ’16 [email protected] ABSTRACT Jumpsuits, maroon ankle boots, and colorful prints are ”in.” Plain black leggings, pastel colors, and Ugg boots are ”so last season.” Finding out what’s currently in style is not always easy. The #TrendingFashion project endeavors to solve that problem. Our site leverages social media to provide users with up-to-date information about current fashion trends. Categories and Subject Descriptors H.4 [Information Systems Applications]: Miscellaneous; D.2.8 [Software Engineering]: Metrics—complexity mea- sures, performance measures General Terms Fashion Detection Keywords Fashion, Data crawling, Classification 1. INTRODUCTION Fashion has always played a large role in culture. It can rep- resent anything from a nationality to an individual’s sense of self. Despite its importance, fashion is not always easy to follow. In the past, the fashion industry defined what was trending. Designers and clothing stores predicted the trends in fashion years before the clothes actually hit the market. These creators were the major sources of new styles, and strongly influenced the direction of the trends. Consumers were limited to following celebrity fashion, runway models, and their close friends’ styles. Trends and sources were well defined. Sites and magazines like Vogue and Elle acted as the authority, directly telling people what was ”in”without any way for the user to provide feedback. The rise of social media has changed the face of fashion. Sites such as Pinterest, Twitter, and Instagram provide plat- forms for users to interact with each other and form tightly connected networks. These networks allow for a bidirec- tional spread of trends between consumers and producers of fashion. Social media offers instant feedback for the design- ers and clothing lines, which then affects the future produc- tion and direction of the trends to come. With the influence of social media, trends may pop back up as fast as in a few weeks or elongate the period of time it takes to resurface. Reposting and widely spreading a certain trend through so- cial media might drastically extend the duration of popu- larity - Van’s shoes or denim jackets, for example. On the other hand, the constant posting of that fad may wear out the novelty and edge of the trend and flush it out of the network faster. The rapidly changing structure of the industry leaves new- comers to fashion at a disadvantage. No longer is there an obvious source for fashion inspiration and guidance. Run- ways are fantastic sources for the professional fashionista and fashion show goer, but runway fashion is often bizarre and outlandish. Magazines cannot keep up with the speed of online trends. Even social media itself is lacking. Sites like Pinterest and Instagram make following individuals’ styles simple, but don’t provide a way to see a bigger picture of the fashion world. Therefore, we wanted to provide a source of fashion inspiration and trend-tracking that directly interacts with social media without requiring background knowledge of the users. 2. SOCIAL MEDIA SOURCE The first step in implementing this project was choosing a social media site from which to collect fashion data. After looking at many other options such as Pinterest, Facebook, and Twitter, we found Instagram to be the ideal option. 2.1 Instagram Basics Instagram is a photo-based sharing site where users can up- load photos, follow other users’ accounts, and like and com- ment on other photos. Photos are also tagged with relevant topics such as ”#fashion” or ”#katespadepurse.” Users can also explore photos based on these tags. Displayed photos are ordered by the time they were posted. 2.2 Choosing Instagram There were three main reasons we picked Instagram over the alternatives. First, the photo-heavy nature of the site lends itself well to fashion related posts. Text-only posts that might appear on sites like Facebook, Twitter, or Tumblr are

Upload: others

Post on 26-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: #TrendingFashion · The Instagram API gives access to tag data like counts and recent posts, media data including comments and likes, and user data which includes number of posts

#TrendingFashion

Grace ParkCaltech ’16

[email protected]

JD Co-ReyesCaltech ’16

[email protected] Mathews

Caltech ’[email protected]

Aditya KaranCaltech ’16

[email protected]

ABSTRACTJumpsuits, maroon ankle boots, and colorful prints are ”in.”Plain black leggings, pastel colors, and Ugg boots are ”so lastseason.” Finding out what’s currently in style is not alwayseasy. The #TrendingFashion project endeavors to solve thatproblem. Our site leverages social media to provide userswith up-to-date information about current fashion trends.

Categories and Subject DescriptorsH.4 [Information Systems Applications]: Miscellaneous;D.2.8 [Software Engineering]: Metrics—complexity mea-sures, performance measures

General TermsFashion Detection

KeywordsFashion, Data crawling, Classification

1. INTRODUCTIONFashion has always played a large role in culture. It can rep-resent anything from a nationality to an individual’s senseof self. Despite its importance, fashion is not always easy tofollow. In the past, the fashion industry defined what wastrending. Designers and clothing stores predicted the trendsin fashion years before the clothes actually hit the market.These creators were the major sources of new styles, andstrongly influenced the direction of the trends. Consumerswere limited to following celebrity fashion, runway models,and their close friends’ styles. Trends and sources were welldefined. Sites and magazines like Vogue and Elle acted asthe authority, directly telling people what was ”in” withoutany way for the user to provide feedback.

The rise of social media has changed the face of fashion.Sites such as Pinterest, Twitter, and Instagram provide plat-forms for users to interact with each other and form tightly

connected networks. These networks allow for a bidirec-tional spread of trends between consumers and producers offashion. Social media offers instant feedback for the design-ers and clothing lines, which then affects the future produc-tion and direction of the trends to come. With the influenceof social media, trends may pop back up as fast as in a fewweeks or elongate the period of time it takes to resurface.Reposting and widely spreading a certain trend through so-cial media might drastically extend the duration of popu-larity - Van’s shoes or denim jackets, for example. On theother hand, the constant posting of that fad may wear outthe novelty and edge of the trend and flush it out of thenetwork faster.

The rapidly changing structure of the industry leaves new-comers to fashion at a disadvantage. No longer is there anobvious source for fashion inspiration and guidance. Run-ways are fantastic sources for the professional fashionistaand fashion show goer, but runway fashion is often bizarreand outlandish. Magazines cannot keep up with the speed ofonline trends. Even social media itself is lacking. Sites likePinterest and Instagram make following individuals’ stylessimple, but don’t provide a way to see a bigger picture of thefashion world. Therefore, we wanted to provide a source offashion inspiration and trend-tracking that directly interactswith social media without requiring background knowledgeof the users.

2. SOCIAL MEDIA SOURCEThe first step in implementing this project was choosing asocial media site from which to collect fashion data. Afterlooking at many other options such as Pinterest, Facebook,and Twitter, we found Instagram to be the ideal option.

2.1 Instagram BasicsInstagram is a photo-based sharing site where users can up-load photos, follow other users’ accounts, and like and com-ment on other photos. Photos are also tagged with relevanttopics such as ”#fashion” or ”#katespadepurse.” Users canalso explore photos based on these tags. Displayed photosare ordered by the time they were posted.

2.2 Choosing InstagramThere were three main reasons we picked Instagram over thealternatives. First, the photo-heavy nature of the site lendsitself well to fashion related posts. Text-only posts thatmight appear on sites like Facebook, Twitter, or Tumblr are

Page 2: #TrendingFashion · The Instagram API gives access to tag data like counts and recent posts, media data including comments and likes, and user data which includes number of posts

Figure 1: Overall infrastructure layout. Posts are crawled throughout Instagram, which we then store inDropbox. This data is used to generate time trend data as well as filtered out using fashion detection. Theseresults are combined to output to the website.

not as useful for users trying to get a sense of hot trends,given that fashion is an aesthetic subject.

Second, we wanted to take advantage of the tagging styleof Instagram. Each post can have many tags, and they aredefined by the poster. Tags provide us with a baseline filterfor fashion-related posts.

Third, we found that the API Instagram provides best suitedour project’s needs. It allows our crawler to access posts at arate that makes updating trends often more feasible than theAPIs provided by other sites that were otherwise suitable,such as Pinterest.

3. DATA COLLECTIONWe detect trending tags on Instagram and then find popularphotos for each trending tag. We also find influential usersrelation to fashion. This process involves crawling tag countdata, detecting trending tags, crawling for posts with highnumbers of likes, and detecting users with both high numberof fashion related posts and likes.

3.1 Tag count crawlerThe Instagram API gives access to tag data like counts andrecent posts, media data including comments and likes, anduser data which includes number of posts and likes. Becausethe API only gives the counts of a given tag at that partic-ular moment, our system must crawl the counts of any tagsrelated to fashion periodically. We start with a list of ap-proximately 1000 seed tags related to fashion and clothingcompiled from an initial crawl of Instagram and from a list

of fashion brands. Every hour, the crawler requests the tagcount (number of posts with that tag) through the Insta-gram API and saves the data. Each request for the countsof a single tag also returns the tag counts of 19 other similartags so we save this additional data. The crawler recordsthe counts of approximately 20,000 different tags along withthe time stamp. The API does not give geo-specific countdata and so the counts take into account posts globally.

3.2 Trend DetectionWe first filter out tags based on a shadow tags list. Thislist has been generated from the image filter component(described below) and from manual inspection of the tags.We then use the Holt-Winter’s method to forecast populartags. The Holt-Winter’s method performs triple exponentialsmoothing to take into account small fluctuations and anyseasonal trends in the data. In our case, season could cycli-cal patterns over the day like posting being more popular atnight or weekly patterns like more posts occurring on week-ends. Suppose we have a sequence of counts {xt} beginningat time t = 0 with a cycle of seasonal change of length L. Welet {st} represent the smoothed value for time t and {bt} bethe sequence of best estimates of the linear trend superim-posed on the seasonal changes. {ct} is the sequence of seasoncorrectional factors and ct is the expected proportion of thepredicted trend at any time t mod L. Ft+m is the final out-put and the estimate of the value of x at time t+m based onthe raw data counts up to time t. Then triple exponentialsmoothing is given by the following formulas [1].

Page 3: #TrendingFashion · The Instagram API gives access to tag data like counts and recent posts, media data including comments and likes, and user data which includes number of posts

s0 = x0 (1)

st = αxtct−L

+ (1− α)(st−1 + bt−1) (2)

bt = β(st − st−1) + (1− β)bt−1 (3)

ct = γxtst

+ (1− γ)ct−L (4)

Ft+m = (st +mbt)ct−L+1+(m−1) mod L, (5)

where 0 < α < 1, 0 < β < 1, and 0 < γ < 1 are the data,trend, and seasonal change smoothing factors respectively.Usually the minimum amount of time series data requiredis atleast 2*L since to estimate the initial trend estimate b0we use the formula.

b0 =1

L

(xL+1 − x1L

+xL+2 − x2

L+ . . .+

xL+L − xLL

)(6)

To get the initial seasonal factors ci for i from 1 to L we use

ci =1

N

N∑j=1

xL(j−1)+i

Aj(7)

where Aj is the average value of x in the jth cycle of thetime series data.

Aj =

∑Li=1 xL(j−1)+i

L(8)

Using this method on the tag count time series data we canthen predict what tags will have the greatest percent changein number of posts. We consider tags that are going toincrease the most relative to their own count to be trendingtags. From these trending tags we compile a list of toptrending tags to crawl. We diversify this list by computingthe Levenshtein distance between any two tags a and b givenby the following formula.

lev(i, j) =

max(i, j) if min(i, j) = 0,

min

leva,b(i− 1, j) + 1

leva,b(i, j − 1) + 1

leva,b(i− 1, j − 1) + 1(ai 6=bj)

otherwise.

Tags that are similar to a word already in the top tags list arenot added. So for example, the more popular of ’chanelhand-bags’ and ’fenihandbags’ would be added to the list. Thisavoids having multiple items of the same type and multileitems of the same brand.

3.3 Media post crawlerThe media post crawler finds the top posts for a given tagfrom the top trending tags list computed from the trenddetection system. While the tag count crawler runs roughlyevery hour, the media post crawler runs twice a day. Wedid not observe any rapid changes in tag counts to warrantcrawling more frequently.

Given the crawl period of roughly half a day, the crawlershould find all popular posts for a given tag posted in the

last twelve hours. The number of posts for a given tag perday can range anywhere to 500 to 25,000. The InstagramAPI orders media for a tag based on recency which is whena post was tagged with that tag. For a given tag, the crawlerfinds the 5000 most recent posts and then saves the top 100posts with the most likes.

3.4 Influential User DetectionBased on the posts the media post crawler finds, we try todetect which users are the most influential. We first filterout users with low numbers of likes or shares. Then we findusers that both have posts with trending tags and posts withhigh numbers of likes. These top users give a sense of whoto follow on Instagram for emerging fashion trends.

4. IMAGE FILTERINGEven after looking through fashion related tags, a lot of im-ages gathered were still not relevant for fashion. Images ofscenery, cars, etc. often came up. In order to improve theimage quality of the website, we implemented a classifica-tion scheme to distinguish between fashion and non-fashionimages

To implement this, we exploited the Scale-invariant featuretransformation (SIFT). To compute the SIFT features, thereare four major stages: extrema detection, keypoint local-ization, orientation computing, and creating the keypointdescriptors.

For efficient detection of keypoints, we need to compute bothover space and different scales: at small scales we couldsee small differences, but over larger scales, we might seea bigger change, such as a foreground, background change.We use a Gaussian as the scale-space kernel

To do this, we compute an approximation to the Laplacianof the Gaussian over different scales. The Lapalacian of theGaussian, in this case, is

∂G

∂σ= σ∇2G (9)

where G(x, y, σ) = 12πσ2 e

− (x2+y2)

2σ2 From this, we can do alinear approximation, which gives us

G(x, y, k, σ)−G(x, y, σ)

kσ − σ (10)

Therefore, as approximation, we just compute the differencein Gaussian over the different scales. which is a slightly moreefficient approximation. This gives us.

D(x, y, σ) = (G(x, y, kσ)−G(x, y, σ))I(x, y)

= L(x, y, kσ)− L(x, y, σ) (11)

Where I(x, y) is the Image representation in 2D space, and∗ is the convolution operator, and

Page 4: #TrendingFashion · The Instagram API gives access to tag data like counts and recent posts, media data including comments and likes, and user data which includes number of posts

Figure 2: Visualization of SIFT keypoints without orientation. Here the circles represent where the SIFTalgorithm has determined are the important points. The relative location and later orientation is then usedto help cluster and identify different types of images

Once a set of possible keypoints are identified, they are fur-ther localized, as to eliminate regions that have low contrastor keypoints that are representing edges instead of usefulfeatures. To do this, we take the Taylor expansion of D(x)

D(x) = D +∂D

∂xx +

1

2xT ∂

2D

∂x2x (12)

We then take the derivative of this and find the local ex-trema x̄ The derivative is approximated by using the differ-ences of neighboring points. For local extrema, with a norm|D(x̄)| < δ, we discard these candidates as areas with toosmall contrast. In most applications δ = 0.03 serves well.

Once the regions of these keypoints are determined, an ori-entation to each keypoint is assigned to get invariance toimage rotation. Gradient orientations are computed from aregion around the keypoint, and binned into 36 bins cover-ing 360 degrees. Each sample is weighted by the magnitude.An orientation histogram is created, and one takes the high-est peak in the histogram as well as other peaks that arewithin 80% of the histogram. While this creates a separatekeypoint for the point, the different orientations help distin-guish important features within the image

From these keypoints, we create a descriptor for each one, inwhich we take a 16× 16 neighborhood, divided into 16 sub-blocks, in which we create a 8 b in orientation histogram,which totals to 128 bins, which we interpret at the 128 di-mension vector.

From this process, keypoints are identified and localized.Figure 2 has an example of visualized SIFT features. [3]

After generating these keypoints for each image, we imple-mented a nearest neighbors clustering scheme. From Kamb-hampati’s research on trends on Instagram, the author no-tices about 10 different classes of images. [2]. We use thesecategories as the basis of our classification scheme. Afterfiddling around with the images, we decided to expand thisclassification to twelve schemes, which was meant to helpdifferentiate between face shots (selfies), full body shots ofindividuals and group photos.

A representative set of images were used as a model for thesecategories. Afterwards, we collecting training data based onpulling the most recent post from Instagram periodicallythroughout the day. After manual classification of about200 images, and out of sample testing of about 400 images,we came up with our classifier. Out of all of the possibleclassifications, we let images that were classified as ”fash-ion items”, ”face”, ”single person”, and ”groups” be noted asfashion related items, and the remaining classes as not fash-ion related items.

We used this filtering mechanism for two purposes. One wasto help determine fashion related tags. While crawling, listof images associated with specific tags would be stored. Wewould go through these large list of media associated withthese tags, and classify each one of them as fashion or not

Page 5: #TrendingFashion · The Instagram API gives access to tag data like counts and recent posts, media data including comments and likes, and user data which includes number of posts

Figure 3: Sample predictions of the classification scheme.

fashion. If the percent of these images were below somethreshold, we would consider this tag, not fashion, and ex-clude the tag from future crawling. In this manner, theimage classification would feedback into the crawling, andovertime improve crawling results.

The second use for image based classification was to narrowdown top images. The initial crawling would produce a setof 100 tags with 10 images each. The image filter could gothrough each of those tags, and determine if those imageswere also fashion related or not. This was used to help selectwhich tags, and images to display onto the final website,tags that had a high amount of non-fashion related imageswould be removed, and even tags that tended to have mostlyfashion related items, would have non-fashion related itemsremoved as to ensure proper image quality.

5. TREND VISUALIZATIONSWe used d3.js to create visualizations of the trends we foundwith respect to time. The graphs compare total count ofphotos using a given trending tag over time. Note that thismeans the counts are cumulative over time.

5.1 Comparing TrendsAs a measure of accuracy, we can compare our graphs toGoogle Trend results for the same topics. Google’s trendreports display interest in a topic at any given time basedon the number of times it is searched. The graphs are thennormalized to a range of zero to one-hundred. This meansthere are no absolute numbers, but the popularity of differ-ent topics can be compared.

Figure 4: D3.js trend visualizations for #katespade-bag and #fendipeekaboo.

Figure 5: Google Trend results for ”Kate SpadeBag” and ”Fendi Peekaboo”.

We can see that the two graphs show similar results. BothKate Spade bags and Fendi Peekaboo bags received a rea-sonably consistent amount of attention for the given timeframe, and both graphs show that Kate Spade was consis-tently more popular.

Page 6: #TrendingFashion · The Instagram API gives access to tag data like counts and recent posts, media data including comments and likes, and user data which includes number of posts

5.2 One-time EventsGiven our method of trend detection, our site is particu-larly receptive to event-based trends. These are trends thatappear very quickly based on some specific event, and dis-appear very soon after.

An example of this is the popularity of Floyd Mayweather’sclothing brand on Instagram the weekend of the Mayweather-Pacquiao fight in early May. Both our data and GoogleTrends show the spike of attention the topic received justafter the fight on May second.

Figure 6: D3.js trend visualization for #moneymay-weather. The topic saw a rapid rise in poularity, but,as indicated by the plateau, had a short lifespan asa popular trend.

Figure 7: Google Trends for ”Money Mayweather”.Note that these results also include searches for”Money Mayweather” regarding the person, and notnecessarily the the clothing brand. However, thesame spike in attention seen in the fashion datagraphs on our site is displayed here as well.

6. FRONT-ENDThe product has been deployed to the URL:www.fashionprecis.com with the name ”Fashion Precis.”

6.1 Server set upThe website is hosted by Heroku. The backend is startedusing Node.js express module, and the website itself usesjade, html, bootstrap for layout, and css for styling. Al-though a database is set up, the data transfer is so minimalthat we store our information on Dropbox. The website isauto-deployed every two hours to deliver current trendingposts.

6.2 InterfaceThe main page of the website displays thirty-two top trend-ing Instagram tags with relevant photos, Google Trends graphs,and a link to buy an article of clothing related to the tag.

The website also has pages which display top ten fashionrelated Instagrammers for the user to directly follow as wellas the trend visualizations discussed earlier.

Figure 8: Top 32 Image page. Users can scrollthrough these images and, if any of the items ap-pear interesting, the user can click to learn more.

Figure 9: After clicking on a picture, the user getsother related pictures as well as Google Trend infor-mation

Page 7: #TrendingFashion · The Instagram API gives access to tag data like counts and recent posts, media data including comments and likes, and user data which includes number of posts

Figure 10: Top Instagrammers page on the website.These are determined by the users that have a highnumber of likes or shares and have fashion relatedtags

The design is minimalistic to match the current trends in UIdesign and to allow the user to focus on only the importantcontents.

7. FUTURE WORKSCurrently, this product utilizes data from Instagram, due tothe ease of use of its API and relatability to fashion content.To correctly represent the trendiness of a style correctly,information must be gathered from a variety of sources asinitially planned. Additional potential sources include othersocial media: Facebook, Twitter, and Pinterest; designers:Michael Kors, Kate Spade, and Marc Jacobs; and stores: for-ever21, Urban Outfitters, and Anthropology, to name a few.A weighting system would be implemented to give differ-ent sources an appropriate level of influence on determiningwhat is actually trending.

7.1 Other capabilitiesFashion is an idea that cannot be generalized to fit everyone.The future product would include user-specific personaliza-tion.

The user can filter by the type of article of clothing that theywish to explore. For example, a high school senior lookingfor a prom dress would filter only by the keyword ”dress.”

The user can input their age, gender, and location for theproduct to automatically suggest clothing fitting for thatgroup of people. Alternatively, the user can click a varietyof keywords and/or images that best describe their style,and from those choices, the product would find fashion thatthe specific user would enjoy. The user can continue to givefeedback by clicking an up or down button to allow the prod-uct to learn what is the user’s style really.

7.2 Item SegmentationIn the literature, efforts have also been made to identifyspecific fashion related objects in photos. [4] In the future,it would, therefore be useful, to look through images onsocial media, and identify the type of fashion item a certainpicture has (boots/scarves, etc.) and identify trends purelybased on the type of items within the image.

7.3 MarketingThis product caters to the social media users and utilizes thesignificant demand in fashion information of recent times.By featuring products, styles, designers, and stores that sellor designed the particular article of clothing that is trend-ing at the time, the sources of this fashion benefit from theadvertisement. This way, the product brings in income fromthe fashion sources who would actively like more advertise-ment from this website or app.

Another option that the product provides is a method topurchase the item immediately. Retail stores will take ad-vantage of this redirect and pay to be sponsored on the web-site or app as a seller for particular articles of clothing.

8. ACKNOWLEDGEMENTSWe would like to thank Adam Wierman for all of his feed-back and guidence for this project.

9. REFERENCES[1] C. Croarkin. Nist sematech e-handbook of statistical

methods, 2013.

[2] Y. Hu, L. Manikonda, and S. Kambhampati. What weInstagram:A First Analysis of Instagram PhotoContent and User Types. Proceedings of the EighthInternational AAAI Conference on Weblogs and SocialMedia, pages 595–598, 2014.

[3] D. G. Lowe. Distinctive image features fromscale-invariant keypoints. International journal ofcomputer vision, 60(2):91–110, 2004.

[4] K. Yamaguchi, M. H. Kiapour, L. E. Ortiz, and T. L.Berg. Parsing clothing in fashion photographs. InComputer Vision and Pattern Recognition (CVPR),2012 IEEE Conference on, pages 3570–3577. IEEE,2012.