reverse engineering twitter hashtag algorithm

21

Upload: marat-zhanikeev

Post on 16-May-2015

601 views

Category:

Technology


3 download

DESCRIPTION

Twitter today markets itself through neat infographics where it explains how its main features -- specifically, the hashtag -- should be used. The term used in most infographics is contributing value to conversation. Since no engineering logic is supplied along with the term, there is no way to know what it means in practice. This paper proposes a model that can be used to collect, process, and visualize the hashtag algorithm, relative to a user's own account. Software implementation is also provided.

TRANSCRIPT

Page 1: Reverse Engineering Twitter Hashtag Algorithm
Page 2: Reverse Engineering Twitter Hashtag Algorithm

.

Contributions

1. a brand new method for crawling social networks2. a framework that can be used by social media to evaluate impact

◦ = probability for tweets to show up in hashtag streams

3. example analysis based on the above

.The goal is.....

.... to reverse engineer hashtag algorithm

M.Zhanikeev -- [email protected] -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 2/21...

2/21

Page 3: Reverse Engineering Twitter Hashtag Algorithm

.

Twitter Hashtags

M.Zhanikeev -- [email protected] -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 3/21...

3/21

Page 4: Reverse Engineering Twitter Hashtag Algorithm

.

Hashtag Streams

.Hashtag Streams are .....

.... streams of tweets that show up when people search Twitter

• hashtag is the best way to search

• note: Twitter tries to phase out hashtags (and mentions), so search may findtweets even without hashtags

.Hashtags are Important......... because they are used by social media to promote events, products, etc.

M.Zhanikeev -- [email protected] -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 4/21...

4/21

Page 5: Reverse Engineering Twitter Hashtag Algorithm

.

Twitter Infographics

• Twitter promotes hashtags by releasinginfographics

• the content is very confusing for socialmedia

• hard to translate into numbers, concreteactions, etc.

M.Zhanikeev -- [email protected] -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 5/21...

5/21

Page 6: Reverse Engineering Twitter Hashtag Algorithm

.

Twitter Infographics (2) : Zoom-Ins

M.Zhanikeev -- [email protected] -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 6/21...

6/21

Page 7: Reverse Engineering Twitter Hashtag Algorithm

.

Twitter Infographics (3) : Cleanup

YES

Decide

New Tag?

Will you promote it?

Will you add value?

Add to hashtagstream

Out OutNO

NONO

YESYES

• all the garbage cleaned out, a muchclearer decision algorithms

• does not clarify what the value orpromotion mean in practice

• since Twitter does not help, we need toreverse engineer the algorithm

M.Zhanikeev -- [email protected] -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 7/21...

7/21

Page 8: Reverse Engineering Twitter Hashtag Algorithm

.

Crawling vs Sampling

M.Zhanikeev -- [email protected] -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 8/21...

8/21

Page 9: Reverse Engineering Twitter Hashtag Algorithm

.

Crawling : Practice and Problems

• traditional crawling is done in commandline usingwget or

curl• problem1: Twitter and others try to avoid being crawled and created fences(login, cookies, forwarding, JS post-loading, etc.)

• problem2: official APis are very restricted, Twitter API does not coversearch

• problem3: hard to use other services while crawling .... Twitter +YouTube

M.Zhanikeev -- [email protected] -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 9/21...

9/21

Page 10: Reverse Engineering Twitter Hashtag Algorithm

.

Snowball Sampling

• the new way to look at sampling• done in cycles:

1. sample something2. select a wanted subset3. sample the subset at a higherdepth

4. .... repeat

• snowball sampling is directly applicableto crawling Twitter

M.Zhanikeev -- [email protected] -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 10/21...

10/21

Page 11: Reverse Engineering Twitter Hashtag Algorithm

.

Crawling : Two Approaches

• approach 1 (traditional) : use APIs (HTTP,OAuth, etc.) to get data

• approach 2 (proposed) : attach your robotto a working Twitter webapp in browser◦ interaction is via clicks, just like human◦ more natural

M.Zhanikeev -- [email protected] -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 11/21...

11/21

Page 12: Reverse Engineering Twitter Hashtag Algorithm

.

Implementation

M.Zhanikeev -- [email protected] -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 12/21...

12/21

Page 13: Reverse Engineering Twitter Hashtag Algorithm

.

Implementation : Twaater

• Chrome extension, auto-triggered byloading a Twitter page

• storing logs in one's own Dropbox drive

M.Zhanikeev -- [email protected] -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 13/21...

13/21

Page 14: Reverse Engineering Twitter Hashtag Algorithm

.

Implementation : Twaater

• https://github.com/maratishe/twaater• personalization

1. need to change Dropbox auth tokens to point to one's own drive2. enter Twitter under own account and let Twaater pick up from here

• runs continuously, close browser when want to stop

M.Zhanikeev -- [email protected] -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 14/21...

14/21

Page 15: Reverse Engineering Twitter Hashtag Algorithm

.

Example Analysis

M.Zhanikeev -- [email protected] -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 15/21...

15/21

Page 16: Reverse Engineering Twitter Hashtag Algorithm

.

Twaater : Metric Space

• tweet metrics/counts: links, retweets,favorites, tags, tagstatus, mentions

• + account metrics/counts: tweets, following,followers

M.Zhanikeev -- [email protected] -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 16/21...

16/21

Page 17: Reverse Engineering Twitter Hashtag Algorithm

.

Twaater : Tweet Timelin

• all metrics change in time• timeline of one tweet is veryimportant

• aggregates tweet status and itsposition (if any) in hashtag streams◦ for each hashtag contained in a tweet

M.Zhanikeev -- [email protected] -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 17/21...

17/21

Page 18: Reverse Engineering Twitter Hashtag Algorithm

.

Analysis : Rules and CCF

• lists : time serious of metrics versus time series ouf positions in hashtagstreams◦ ccf( metric values, hashtag positions)

◦ note that there are all and top hashtag streams

• selection : pick a max in time series, and filter lists by threshold◦ thresholds are different for each metric◦ helps to filter out noise or focus only on large (important) values

• view showing up in hashtag streams as binary (yes/no) versus analog(list position) values

• extras (future work) : analysis along the timeline, much higher complexity

M.Zhanikeev -- [email protected] -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 18/21...

18/21

Page 19: Reverse Engineering Twitter Hashtag Algorithm

.

Analysis : Results

0 0.1 0.2 0.3 0.4 0.5Threshold (% of max)

-1.05-0.7

-0.35

0.350.7

1.05

ccf

tags

links

mentions

retweets favorites

tweetsfollowing

followers

tagstatus

all/binary

0 0.1 0.2 0.3 0.4 0.5Threshold (% of max)

-1.05-0.7

-0.35

0.350.7

1.05

ccf

tagslinksmentionsretweets

favoritestweets

following

followers

tagstatus

top/binary

0 0.1 0.2 0.3 0.4 0.5Threshold (% of max)

-1.05-0.7

-0.35

0.350.7

1.05

ccf

tags

links

mentionsretweets

favorites

tweetsfollowing

followers

tagstatus

all/actual

0 0.1 0.2 0.3 0.4 0.5Threshold (% of max)

-1.05-0.7

-0.35

0.350.7

1.05

ccf

tagslinks

mentionsretweets

favorites

tweets followingfollowerstagstatus

top/actual

• binary: useless• analog: filtering outvery low values (most)helps reveal goodcorrelation◦ for example,favoritescontributes to tweetsshowing up closerto top in lists

• account metrics:show no effect

• among large values,tagstatus (topicpopularity) becomesprominent

M.Zhanikeev -- [email protected] -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 19/21...

19/21

Page 20: Reverse Engineering Twitter Hashtag Algorithm

.

Future Work

• Twaater is own-centric, makes is possible to crowdsource/distributecrawling◦ fits the description of snowball sampling

• 2nd order statistics (CCF) did not reveal a simple hashtag algorithm◦ more complicated models have to be tested

• alternatively smarter filtering can also help◦ ... select a subset of important tweets to subject to analysis

M.Zhanikeev -- [email protected] -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 20/21...

20/21

Page 21: Reverse Engineering Twitter Hashtag Algorithm

.

That’s all, thank you ...

M.Zhanikeev -- [email protected] -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 21/21...

21/21