[ieee 2011 international conference on advances in social networks analysis and mining (asonam 2011)...

8
Social Network User Lifetime Juan Lang Department of Computer Science University of California, Davis [email protected] S. Felix Wu Department of Computer Science University of California, Davis [email protected] Abstract—Online Social Network (OSN) operators are inter- ested in promoting usage among their users, and try a variety of strategies to encourage use. Some recruit celebrities to their site, some allow third parties to develop applications that run on their sites, and all have features intended to encourage use. As important as usage is, we are unaware of any studies into what influences users to be active and to remain online. This paper is the first work studying the lifetime of OSN users, examining the factors that influence lifetime in one OSN, Buzznet. The major contributions of this work are the study of active lifetime, the features and behaviors that encourage activity, and the comparison of active lifetime to passive lifetime. I. I NTRODUCTION Online Social Network (OSN) operators are interested in promoting usage among their users. Most are funded at least in part by advertising, so increased usage translates directly into increased revenue. Some social networks are funded directly by their users, so a loyal user base also translates directly into increased revenue for the networks’ operators. Maintaining active users is a challenge for existing OSNs. For example, 40% of Twitter accounts have no activity, and 60% of users who visit Twitter in one month fail to return the following month [1], while approximately half of all Facebook accounts crawled in one study had no activity [2]. This work investigates how different factors of a user’s use of one OSN, Buzznet 1 , correspond with the user’s lifetime in that OSN. Relative to the largest OSNs, Buzznet has a small user base, and it lacks some of the features available on some larger OSNs such as Facebook. While Buzznet’s smaller user base might be a disadvantage for research purposes, its small size allowed us to crawl the largest connected component (LCC) of the social graph, avoiding selection bias [3]. Because Buzznet has fewer features than competing OSNs, the lifetime of its users can be studied independently of features that might encourage increased usage. The insights gained could therefore help decide which features are the most beneficial for promoting lifetime. The major contributions of this work are the study of lifetime as a distinct phenomenon, in addition to identifying features that appear to influence lifetime. Additionally, this work compares active lifetime to passive lifetime, examining 1 http://www.buzznet.com to what extent they coincide and the factors that appear to influence passive lifetime. To our knowledge, ours is the first study into the factors that influence user lifetimes for any OSN. The remainder of this work is organized as follows: Related work is discussed in Section II. Background for the work is discussed in Section III. Results are discussed in Section IV. Recommendations are given in Section V. We conclude in Section VI. II. RELATED WORK Understanding social network user lifetime may require evaluating the motivations for using OSNs. Ellison et al [4] conducted a survey of 286 Facebook users, identifying a rela- tionship between Facebook use and psychological well-being. Similary, Joinson [5] conducted a survey of 241 Facebook users in order to identify self-reported motivations for using Facebook. Examining user lifetime is related to studying churn, which has been studied in other contexts. Notably, Dasgupta et al [6] studied churn in mobile telecom networks, and observed that users whose friends recently left the network are more likely to leave the network. In telecom networks, users have clear dates on which they leave the network, allowing such analysis to be conducted. Because most OSNs are free for their users, user accounts are more likely to be dormant than to be removed altogether. Stutzbach and Rejaie [7] measured churn on peer to peer (P2P) networks, where peer lifetime is an important component of the network’s availability. Peer lifetime in P2P networks is a measure of how long a particular instance of a running peer remains connected to the network, which is a different phenomenon that that which we study here. Wilson et al [2] crawled 10 million Facebook profiles and observed that users are most likely to be active early in their profiles’ lifetimes. They also observed that approximately half of all users crawled generated no interactions, which is similar to the results from Twitter. Viswanath et al [8] analyze the activity graph of 60k users in the New Orleans network within Facebook. They focus only on pairs of users with activity between the two users. They show that the fraction of messages sent over time decreases after a link is created between two communicating users, which further confirms the observation that users decay over time. 2011 International Conference on Advances in Social Networks Analysis and Mining 978-0-7695-4375-8/11 $26.00 © 2011 IEEE DOI 10.1109/ASONAM.2011.28 289

Upload: s-felix

Post on 11-Apr-2017

214 views

Category:

Documents


0 download

TRANSCRIPT

Social Network User Lifetime

Juan LangDepartment of Computer Science

University of California, [email protected]

S. Felix WuDepartment of Computer Science

University of California, [email protected]

Abstract—Online Social Network (OSN) operators are inter-ested in promoting usage among their users, and try a varietyof strategies to encourage use. Some recruit celebrities to theirsite, some allow third parties to develop applications that runon their sites, and all have features intended to encourage use.As important as usage is, we are unaware of any studies intowhat influences users to be active and to remain online. Thispaper is the first work studying the lifetime of OSN users,examining the factors that influence lifetime in one OSN, Buzznet.The major contributions of this work are the study of activelifetime, the features and behaviors that encourage activity, andthe comparison of active lifetime to passive lifetime.

I. INTRODUCTION

Online Social Network (OSN) operators are interested inpromoting usage among their users. Most are funded at least inpart by advertising, so increased usage translates directly intoincreased revenue. Some social networks are funded directlyby their users, so a loyal user base also translates directly intoincreased revenue for the networks’ operators. Maintainingactive users is a challenge for existing OSNs. For example,40% of Twitter accounts have no activity, and 60% of userswho visit Twitter in one month fail to return the followingmonth [1], while approximately half of all Facebook accountscrawled in one study had no activity [2].

This work investigates how different factors of a user’s useof one OSN, Buzznet1, correspond with the user’s lifetimein that OSN. Relative to the largest OSNs, Buzznet has asmall user base, and it lacks some of the features available onsome larger OSNs such as Facebook. While Buzznet’s smalleruser base might be a disadvantage for research purposes, itssmall size allowed us to crawl the largest connected component(LCC) of the social graph, avoiding selection bias [3]. BecauseBuzznet has fewer features than competing OSNs, the lifetimeof its users can be studied independently of features thatmight encourage increased usage. The insights gained couldtherefore help decide which features are the most beneficialfor promoting lifetime.

The major contributions of this work are the study oflifetime as a distinct phenomenon, in addition to identifyingfeatures that appear to influence lifetime. Additionally, thiswork compares active lifetime to passive lifetime, examining

1http://www.buzznet.com

to what extent they coincide and the factors that appear toinfluence passive lifetime. To our knowledge, ours is the firststudy into the factors that influence user lifetimes for any OSN.

The remainder of this work is organized as follows: Relatedwork is discussed in Section II. Background for the work isdiscussed in Section III. Results are discussed in Section IV.Recommendations are given in Section V. We conclude inSection VI.

II. RELATED WORK

Understanding social network user lifetime may requireevaluating the motivations for using OSNs. Ellison et al [4]conducted a survey of 286 Facebook users, identifying a rela-tionship between Facebook use and psychological well-being.Similary, Joinson [5] conducted a survey of 241 Facebookusers in order to identify self-reported motivations for usingFacebook.

Examining user lifetime is related to studying churn, whichhas been studied in other contexts. Notably, Dasgupta et al [6]studied churn in mobile telecom networks, and observed thatusers whose friends recently left the network are more likely toleave the network. In telecom networks, users have clear dateson which they leave the network, allowing such analysis to beconducted. Because most OSNs are free for their users, useraccounts are more likely to be dormant than to be removedaltogether. Stutzbach and Rejaie [7] measured churn on peerto peer (P2P) networks, where peer lifetime is an importantcomponent of the network’s availability. Peer lifetime in P2Pnetworks is a measure of how long a particular instance ofa running peer remains connected to the network, which is adifferent phenomenon that that which we study here.

Wilson et al [2] crawled 10 million Facebook profiles andobserved that users are most likely to be active early intheir profiles’ lifetimes. They also observed that approximatelyhalf of all users crawled generated no interactions, which issimilar to the results from Twitter. Viswanath et al [8] analyzethe activity graph of 60k users in the New Orleans networkwithin Facebook. They focus only on pairs of users withactivity between the two users. They show that the fractionof messages sent over time decreases after a link is createdbetween two communicating users, which further confirms theobservation that users decay over time.

2011 International Conference on Advances in Social Networks Analysis and Mining

978-0-7695-4375-8/11 $26.00 © 2011 IEEE

DOI 10.1109/ASONAM.2011.28

289

Benevenuto et al [9] use a proxy server to capture accessesto several OSNs, including Orkut and MySpace, over a 12 dayperiod. They observed that users are mainly voyeurs: 92% ofuser activity is a reading activity, and 85% of users in theircapture had no write activities.

III. BACKGROUND

We performed a BFS-based crawl of the Buzznet socialgraph, until we had obtained the largest connected component,containing approximately 750,000 users and 9 million directededges. For each of these users, we collected each of the publicnotes–posts from other users–each user had received, as wellas all of the photos each user posted, and the comments eachphoto received. In all, we retrieved approximately 5 millionnotes, 4 million photos, and 4 million photo comments. Wedid not collect other user activities, including videos andjournal posts. By not crawling all available data, we reducedthe time required for our crawl substantially, at the risk ofintroducing bias in our estimation. We revisit the issue shortly.For each user, we also collected account information includingthe account’s creation date, a list of the user’s interests, anddemographic information.

More formally, let G = (V,E) represent the Buzznet socialgraph, where V is the set of users, and E is the set of edges.Given two users, u, v ∈ V , and an edge u → v ∈ E, u is oneof v’s followers, and v is one of u’s friends.

Definition 1. User u’s in degree ki(u) = |v| such that thereis an edge v → u ∈ E.

Definition 2. User u’s out degree ko(u) = |v| such that thereis an edge u → v ∈ E.

The CDF of in degree for users in our crawl appearsin Figure 1. Note that it follows the familiar power lawdistribution, with the largest group of users having an in degreeof 1, and a very small fraction of users having a much largerin degree.

In order to measure the active lifetime of a user, we look atthe date on which the user makes their last recorded activity onthe site. Since our crawl did not obtain all available data, weneeded to ensure that the data we did crawl are representativeof usage on the site as a whole. To do so, we crawled allactivity for a sample of 9,000 users chosen at random fromthe LCC discovered in our initial crawl. For these users, wecompared the date of the last activity in the subset of datain our initial crawl, i.e. notes, photos, and photo comments,to the date of the actual last activity. 98% of the users inthis crawl had no difference between the two dates, and themean difference between the two dates was 2.1 days. Thus,the subset of data we crawled is representative of activity onthe site as a whole.

Definition 3. A user’s active lifetime is the number daysbetween the user’s profile creation date and the last recordedactivity from the user.

In addition to active lifetimes, we are interested in whenusers engage in passive activity. We attempt to capture thiswith the following definitions:

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 10 100 1000 10000 100000 1e+06

CD

F

In degree

Fig. 1. CDF of in degree

Definition 4. A user’s passive lifetime is the number of daysbetween the user’s last recorded activity and the last time onwhich the user was logged in.

Definition 5. A user’s total lifetime is the number of daysbetween the user’s profile creation date and the last time onwhich the user was logged in.

The definition of active lifetime does not distinguish be-tween undirected activity, e.g. photos posted by a user to hisor her own profile, and directed activity, e.g. comments onanother user’s photo. In some cases, we need to distinguishdirected activity.

Definition 6. Activity sent by a user u is activity generatedby u and directed to another user v, v 6= u.

Definition 7. Activity received by a user u is activity generatedby another user v, v 6= u, directed to u.

In the data in our crawl, directed activity includes notesfrom one user to another user and comments on a photo froma user other than the photo’s poster.

Because the behavior of the most popular users is differentthan that of other users, we defined a class of the most popularusers, which we termed Celebrities.

Definition 8. A Celebrity is a user whose in degree is abovethe 99.99th percentile.

There were 101 such users in our crawl, with a minimum indegree of 3,032, and a maximum in degree of over 180,000.Edges incident on Celebrities represent approximately 31% ofall edges in our crawl. We then defined two more classes basedon this definition of Celebrity:

Definition 9. A Pure Fan is a user who follows only Celebri-ties.

Definition 10. A Mixed user is a user who is not a Celebrityand who follows at least one non-Celebrity.

We will make use of these classes in our results, which wepresent next.

IV. RESULTS

In this section, we present the active lifetimes of users in ourcrawl, followed by predictors of their active lifetimes. We then

290

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 10 100 1000 10000

CD

F

Profile age at last activity (days)

Fig. 2. CDF of active lifetime

TABLE ICORRELATION BETWEEN ACTIVE LIFETIME AND VARIOUS MEASURES

Correlationcoefficient

In degree 0.03Out degree 0.02Mean clustering coefficient 0.00Number of account details supplied 0.11Last received activity age 0.58First received activity age 0.10

explore the relationship between active lifetimes and passivelifetimes.

A. Active LifetimeA CDF of users’ active lifetimes for users with any recorded

activity is shown in Figure 2. Unlike many phenomena insocial networks, the active lifetime does not follow a power-law distribution: the largest lifetimes are not all that large,though they are bounded by the lifetimes of the social networkitself. Still, little to no activity is the norm: Nearly 2/3 of theusers we collected had no recorded activity. Of those users whodo have recorded activity, the majority of it is early in theirprofile’s lifetime. One third of users with any activity onlyhad activity on the day they created their profiles, and halfhad activity only within the first nine days of their profiles’existence.

B. Predictors of Active LifetimeIn this section, we investigate predictors of users’ active

lifetimes, grouped by hypotheses relating to various propertiesof the users and their activity.

1) Graph structural properties:

Hypothesis 1. A high degree predicts a long lifetime.

Hypothesis 1.1. A high in degree predicts a long lifetime.

The intuition behind the hypothesis is that more popularusers are active longer than less popular users.

Hypothesis 1.2. A high out degree predicts a long lifetime.

The intuition behind the hypothesis is that users who followmany users may be more engaged in the OSN, and thereforehave a longer lifetime.

TABLE IICORRELATION BETWEEN LIFETIME AND VARIOUS MEASURES FOR USERS

BELOW 99TH PERCENTILE

Correlation coefficient

In degree 0.94Out degree 0.90Mean clustering coefficient -0.65Number of account details supplied 0.68

0

200

400

600

800

1000

1200

1400

1600

1800

2000

1 10 100 1000 10000 100000 1e+06

Me

an

active

life

tim

e (

da

ys)

In degree

Fig. 3. Mean active lifetime vs. in degree

Table I shows that the correlation between in and out degreeand active lifetime indicates a random relationship betweenthem. The correlation coefficients don’t tell the whole story,however. Figure 3 shows a plot of the mean lifetime comparedto the in degree of users. The active lifetime compared to theout degree is similar, and is omitted for brevity. As can be seen,there is a strong correlation between degree and mean activelifetime for smaller degrees, but as the degrees get larger thereis a large amount of variation. This variation corresponds to adecrease in density for larger degrees: the 99th percentile of indegree is 164, while the 99th percentile of out degree is 189,so relatively few users have larger degree. Table II shows thecorrelation coefficient between median active lifetime and thein and out degrees for users whose degrees are below the 99thpercentile. As can be seen, for the overwhelming majority ofusers the correlation between degree and lifetime is strong inBuzznet.

Hypothesis 2. A high clustering coefficient predicts a longlifetime.

The hypothesis is that users who are more highly intercon-nected with their friends are more likely to remain active onthe site. As Table I shows, there is no correlation betweenthe active lifetime and the clustering coefficient. Again, thecorrelation doesn’t tell the whole story. Because the clusteringcoefficient does not have discrete values, we placed the activelifetime of users into bins based on their clustering coefficient,and computed the mean and median lifetime of users ineach bin. The width of each bin was 0.01. The results areshown in Figure 4. In contrast to our hypothesis, the lifetimeappears to decrease as the clustering coefficient increases,except at the smallest values of clustering coefficient. Becausethis didn’t match our expectation, we investigated whetheranother variable, in degree, might correlate with clustering

291

0

50

100

150

200

250

300

350

0 0.2 0.4 0.6 0.8 1

Active

life

tim

e (

da

ys)

Clustering coefficient

MeanMedian

Fig. 4. Active lifetime vs. clustering coefficient

0

0.1

0.2

0.3

0.4

0.5

1 10 100 1000 10000 100000 1e+06

Clu

ste

rin

g c

oe

ffic

ien

t

In degree

Fig. 5. Mean clustering coefficient vs. in degree

coefficient. The intuition behind the relationship between indegree and clustering coefficient is that more popular users,those with high in degree, are less likely to be arranged intight clusters with their followers than less popular users. Theresults of our comparison between in degree and clusteringcoefficient are in Figure 5. As can be seen, there is a relativelystrong negative correlation between in degree and clusteringcoefficient. Since in degree and lifetime correlate, the negativecorrelation between clustering coefficient and lifetime is to beexpected.

Still, as we saw with degree, the relationship becomes alittle less clear as the in degree gets very large. In order toisolate whether the active lifetime and the clustering coefficientare related for users with smaller in degree, we repeatedthe comparison for users whose in degree is below the 99thpercentile, shown in Table II. Even for these users, there is arelatively strong negative correlation between clustering coef-ficient and average lifetime, indicating that a user’s popularity,as measured by in degree, has a stronger impact on lifetimethan does the user’s degree of connectedness to his or herfriends.

Hypothesis 3. Whom a user follows influences the user’slifetime.

The intuition behind this hypothesis is that whom youfollow influences whether you’re likely to be, and stay, activein an OSN. For example, if you follow people you know inreal life, you may be more likely to interact with them online

TABLE IIIAVERAGE ACTIVE LIFETIME IN DAYS

Mean Std. Dev. Median

Celebrities 793 400 822Pure Fans 35 124 0Mixed 169 267 44

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 5 10 15 20 25

P(U

se

r a

ctive

at

n =

18

0 d

ays)

Number of communication partners

Fig. 6. Probability of being active vs. number of communication partners

than if you’re following a stranger.In order to investigate this hypothesis, we calculated the

average active lifetimes for the user classes Celebrities, PureFans, and Mixed users. Table III shows the average activelifetimes for each class of users. Unsurprisingly, the Celebritieshave the longest active lifetime. Intriguingly, the Pure Fanshave much shorter mean and median active lifetimes than theMixed users. The data support the hypothesis that whom youfollow influences your likelihood of being active in the site.

2) User behavior:

Hypothesis 4. The number of unique communication partnerspredicts lifetime.

In this hypothesis, a communication partner is a user towhom a user either sends a note or on whose photo a usercomments. The intuition behind this hypothesis is that userswho communicate with a variety of users over their lifetimesare more likely to remain active than users who communicatewith fewer partners. In order to test the hypothesis, 5 samplesof users were chosen, each containing 10,000 users. For eachsample, the mean probability of the users being active 180days after their profiles were created was calculated. (Samplingwas done in order to show the change in confidence in meanprobability as the number of unique communication partnersincreases.) The probability is plotted against the number ofunique communication partners each had in Figure 6. As canbe seen, there is an increase in probability of being activeafter 180 days as the number of communication partnersincreases, though the error gets larger as the number ofpartners increases. In other words, there is some support forthe hypothesis, though the predictive power of the number ofcommunication partners is not very strong.

3) Activity timing:

Hypothesis 5. When a user last receives activity predictslifetime.

292

1

10

100

1000

10000

100000

1 10 100 1000 10000 100000

Nu

mb

er

of

ite

ms s

en

t

Number of items received

Fig. 7. Sent items vs. received items

The intuition behind this hypothesis is that usage of theOSN is mainly spurred by activity within the OSN, ratherthan by unrelated external activity. In order to investigate thishypothesis, we compared the last date on which a user receivedany activity to the same user’s active lifetime. The correlationbetween these two ages, as shown in Table I, is positive, 0.58.This gives the hypothesis some support. Further supportingthe hypothesis is the relationship between received activityand sent activity. Figure 7 shows the number of items sent vs.the number of items received for all users in our crawl, in alog-log scale. The correlation coefficient between the log ofthe number of items sent and the log of the number of itemsreceived is 0.76. In other words, there is a clear relationshipbetween the number of items sent and the number of itemsreceived, as well as a relationship between the date on whichthe last item is received and the date on which the user islast active. This supports the hypothesis that received activitywithin the OSN encourages further activity within the OSN.

Hypothesis 6. When a user first receives activity predictslifetime.

A related hypothesis is that the first date on which auser receives activity is related to the user’s lifetime. Thereare several competing hypotheses regarding the relationshipbetween receipt of the first activity and the active lifetime ofthe user:

Hypothesis 6.1. A user will be relatively inactive until receiv-ing his or her first message or comment from another user.

A positive correlation between the first received activity dateand the user’s lifetime would tend to support this hypothesis.

Hypothesis 6.2. A user will only be active if the first activityhe or she receives happens relatively soon after creating hisor her profile.

A negative correlation between the first received activitydate and the user’s lifetime would tend to support this hy-pothesis, although a random correlation might also suggestsuch a relationship.

Hypothesis 6.3. A user’s lifetime is unaffected by the date onwhich he or she first receives activity.

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0 100 200 300 400 500

Pro

ba

bili

ty o

f b

ein

g a

ctive

Date of first received activity

n = 30n = 180

Fig. 8. Probability of being active n days after first activity

A random correlation between the first received activitydate and the user’s lifetime could suggest that no relationshipbetween the two exists.

Table I shows the correlation between the date of the firstreceived activity, and the date of the last activity of a user. Ascan be seen, there is only a very weak correlation between thetwo dates. This tends to support either Hypothesis 6.2, thatreceiving activity early predicts a longer lifetime, or Hypoth-esis 6.3, that the date on which the first activity is received isunrelated to lifetime. In order to differentiate between thesehypotheses, we consider the following hypothesis:

Hypothesis 7. When a user first receives activity predictssubsequent lifetime.

Hypothesis 5 shows that late user activity is spurred byreceiving late activity, but we wish to know whether thereexists a “critical period” of received activity, i.e. whether thedate of received activity influences the probability that a userremains active for some period of time. In other words, saytwo users received their first activity on different days: oneafter 10 days on the site, and another after 180 days on thesite. How likely are these two users to be active 30 days afterthis first received activity? Figure 8 shows the probability of auser being active both 30 and 180 days after the first activityis received. As the data show, there is a clear decrease in theprobability of a user remaining active as the date on which theyreceive their first activity increases. These tend to support thehypothesis that when activity is received matters: users whoreceive activity early in their profiles’ lifetimes are much morelikely to remain active than those who do not.

Hypothesis 8. The amount of personal account informationsupplied predicts lifetime.

The intuition behind this hypothesis is that the amountof account detail supplied is an indicator of the amountof engagement with the OSN, and may correlate with theprobability of remaining active. The hypothesis is important toinvestigate in order to see whether there is a tradeoff betweenprivacy, in the form of not revealing too much about oneself,and activity. As Table I shows, there is no correlation betweenactive lifetime and the number of user account details supplied.We were concerned that the data are being overwhelmed by

293

0

0.1

0.2

0.3

0.4

0.5

0.6

0 500 1000 1500 2000 2500

Fra

ctio

n o

f u

se

rs lo

gg

ing

in

Profile age (days)

Fig. 9. PDF of total lifetime

outliers, so we computed the active lifetime for users at orbelow the 99th percentile of the number of user account detailssupplied, and computed the correlation between the number ofdetails supplied and the mean active lifetime, given in Table II.As can be seen, there is a positive correlation between meanlifetime and the number of account details supplied, though itisn’t as strong as the correlation between degree and lifetime.This suggests that revealing information about oneself may beone factor that contributes to remaining active on a site.

C. Passive Lifetime

As we described in Section III, we crawled all activity fora sample of 9,000 users. As a side effect of our crawl2, weobtained the last online date for approximately 450,000 users.A plot of the PDF of users’ total lifetime is shown in Figure9. Not surprisingly, users decay over time: far more usersare likely to have logged in early in their profiles’ lifetimesthan later. Figure 10 shows a CDF of the active and passivelifetimes of all users whose online date was obtained in ourcrawl. Recall that the definition of passive lifetime is thenumber days between the last recorded activity of each userand the user’s last online date. As the CDF shows, a higherfraction of users have a small active lifetime than have a smallpassive lifetime. In other words, the expected number of daysof passive activity is greater than the expected number of daysof recorded activity, which confirms the observation that usersare mostly voyeurs.

As we described in our introduction, OSN operators are in-terested in promoting usage among their users. As we showedin Hypothesis 5, active usage is more valuable than passiveusage, because it promotes usage among other users. A simplequestion remains, however: is passive usage also correlatedwith active usage? That is, how accurate an estimator ofpassive lifetime is active lifetime? To answer this question,we calculated the mean error between the active and totallifetimes, as a fraction of the total lifetimes. By normalizingto the total lifetime, we could compare the rate of error acrossusers with very different lifetimes. An error of 0% implies thata user had measurable activity on the same day he or she lastlogged in. An error of 100% implies that a user had an active

2For brevity, the details of the crawling method are omitted.

0

0.2

0.4

0.6

0.8

1

1 10 100 1000 10000

CD

F

Lifetime (days)

Active lifetimePassive lifetime

Fig. 10. CDF of active and passive lifetimes

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 1

Fra

ctio

n o

f u

se

rs

Fraction of online age

All usersPure Fans

Mixed

Fig. 11. Mean error of active lifetime vs. total lifetime

lifetime of 0 days, i.e. that the user only had recorded activityon the same day the user created his or her profile, but thatthe user logged in at some point thereafter. A plot of the meanerror is shown in Figure 11. As can be seen, for more than70% of users, the error is less than 5%, i.e. for the majority ofusers active usage is a good approximation of passive usage.For a minority of users, there is a larger error rate betweenlast online date and last activity date, and this error rate isrelatively evenly distributed.

Intriguingly, 94% Pure Fans had an error less than 5%, whileonly 41% of Mixed users had such a low error rate. In otherwords, Pure Fans were much less likely to be online exceptwhen they had activity, and as we showed in Hypothesis 3,they were also less likely than other users to have activity.

While such broad distinctions about user lifetimes areinteresting, we are more interested in whether the events thatpredict users’ logging in can be identified. In order to do so,we looked at users whose last online date was greater thantheir last activity date, i.e. users with positive passive lifetime.We restricted ourselves to users whose last online date waswithin the range of our initial crawl, as all activity related tothem could be identified. There were approximately 130,000such users. We then looked at two types of activity, receivedactivity and friends’ activity, as predictors of logging in.

Hypothesis 9. Received activity predicts passive lifetime.

As we discussed in Hypothesis 5, received activity appearsto be correlated with (sent) activity. But is it also correlated

294

with passive activity? Ideally, to answer this we would com-pare all online times with the time of received activity, in orderto compare the probability of being online with the probabilityof receiving activity. Because we only know the date eachuser was last online, we know that the probability the userwas online was 1 on the date the user last generated activity,1 on the user’s last online date, indeterminate between, and0 thereafter. In other words, to compare the probability ofreceiving activity to the probability of logging in, we wouldhave to know the probability of logging in during precisely theperiod we do not know. Instead, we focus on the probabilityof receiving activity near the last day logging in. For the130,000 users with positive passive lifetime whose last onlinedate was within the range of our initial crawl, we computedthe probability of the user receiving activity on any day afterthe user’s last generated activity. If the hypothesis is true, wewould expect the probability of receiving activity to be highestshortly before or on the user’s online dates, and generallylower after the user’s last online date.

Figure 12 shows a plot of the probability of receivingactivity on any day after the user’s last recorded activity,normalized to the user’s last online date, shown as day 0.Activity received on negative days reflects activity receivedduring the users’ passive lifetimes, i.e. before the user lastlogged in, while activity received on positive days reflectsactivity received after the users last logged in.

For the 130,000 users with positive passive lifetime, theprobability of receiving activity was significantly highest onday 0, i.e. on the same the day user last logged in. Whilethe probability itself was not large–approximately 2.5%–theprobability of any user receiving activity on any day is verysmall, approximately 0.5%. There is a curious spike in theprobability of receiving activity approximately 130 days beforethe users’ last login date. We have no explanation for this otherthan that it may be an anomaly due to a relatively small dataset.Still, the probability distribution is remarkably regular. Theprobability of a user receiving activity after logging in is alsomuch lower than the probability of a user receiving activitybefore the last time he or she logged in, even though the dataafter logging in are overrepresented: on average, there are moredays in which users are not logged in than days in the users’passive lifetimes. Thus, we conclude that this hypothesis islikely, i.e. that receiving activity predicts passive behavior.

Hypothesis 10. Undirected activity among a user’s friendspredicts passive lifetime.

A further test is whether passive activity is predicted byundirected activity among a user’s friends. In our crawled data,photos posted are undirected. In order to determine whetherphotos posted have any impact on logging in, we computed theprobability of a user’s friends posting photos on any day afterthe user’s last generated activity. If the hypothesis is true, wewould expect the probability a user’s friends posted photos tobe higher shortly before the user’s online dates, and generallylower after the user’s last online date.

Figure 13 shows a plot of the probability of users’ friendsposting photos on any day after the user’s last recordedactivity, again normalized to the user’s last online date at day

0

0.005

0.01

0.015

0.02

0.025

-300 -200 -100 0 100 200 300

Pro

ba

bili

ty o

f re

ce

ivin

g a

ctivity

Days after last online date

Fig. 12. Last online date vs. received activity

0

0.0005

0.001

0.0015

0.002

0.0025

0.003

0.0035

0.004

-300 -200 -100 0 100 200 300

Pro

ba

bili

ty f

rie

nd

s p

oste

d p

ho

tos

Days after last online date

Fig. 13. Last online date vs. photos posted by friends

0. Photos posted on negative days reflect photos posted duringthe users’ passive lifetimes, while photos posted on positivedays reflect photos posted after the users last logged in.

Like for received activity, the probability of a user’s friendsposting photos is highest on day 0, i.e. on the same day theuser last logged in. Again, the probability itself is quite low,even lower than the probability of receiving activity: the peakprobability is about 0.4%. Again, the probability of any userposting photos on any day is quite low: approximately 0.2%on average. The probability of a user’s friends posting photosis also lower after the user last logged in than before. Weconclude that this hypothesis is likely, i.e. that undirectedactivity among a user’s friends also predicts passive lifetime,though perhaps not as strongly as directed activity.

V. RECOMMENDATIONS

The premise of this work is that OSN operators are inter-ested in promoting continued activity among their users orfollowers, and that the factors that encourage continued activ-ity can be analyzed. Based on the hypotheses for which wefound support, we can make the following recommendations:

Recommendation 1. Encourage users to form friendships.

This follows from Hypothesis 1.

Recommendation 2. Recommend users befriend users otherthan the most popular ones.

295

This follows from Hypothesis 3, and is further supportedin Section IV-C. It also suggests that luring well-knowncelebrities to an OSN, e.g. through partnerships, may be anineffective means of encouraging users to remain active inthe OSN. One way to encourage users to befriend users otherthan the most popular ones may be to implement a “Peopleyou may know” feature which recommends users your friendsfollow. Another way may be to foster the creation of onlinegroups, e.g. interest-based, geography-based, or event-basedgroups, to create forums in which users can meet one anotheronline.

Recommendation 3. Encourage users to communicate withone another.

This follows from Hypotheses 4, 5 and 9. One way is tosuggest that users communicate with friends of theirs whohave not been active in some time.

Recommendation 4. Welcome new users to the site.

Especially, the welcome should come from existing users.This follows from Hypothesis 7. One way is to add a step tothe friendship formation process, encouraging the new friendsto send messages to one another or otherwise interact withone another.

Recommendation 5. Encourage users to post frequently.

This follows from Hypothesis 10. Even if a user’s postsaren’t receiving a lot of comments, they may serve to encour-age passive activity among the user’s friends.

VI. CONCLUSION AND FUTURE WORK

In this work, we studied the active and passive lifetimes ofthe LCC of users in one OSN. We examined the behaviorsand properties that predict both active and passive usage ofthe site, and use these characteristics to suggest features thatwould promote usage among an OSN’s users.

It’s tempting to speculate whether the presence or absenceof features encouraging activity are sufficient to induce users’continued activity. For example, Twitter lacks most of thefeatures we recommend, and has a high rate of churn [10].Facebook, in contrast, implements most of the features werecommend. While we are not aware of any estimate ofFacebook’s rate of churn, Facebook themselves state that morethan half of active Facebook users return every month [11].Unfortunately this leaves open the definition of an active user.According to Nielsen, Facebook is the third most popularbrand online [12]. Is Facebook’s popularity due to theirfeatures that encourage activity? Perhaps in part.

For future work, we would like to validate our findingsacross multiple OSNs. We would also like to evaluate theimpact of implementing the suggested features on the usageof an OSN.

Acknowledgements

This research was supported in part by NSF CNS-0832202,BBN-GENI, Army Research Lab (Network Science CTA),ARO MURI (Arsenal), AFOST MURI (Helix), and Intel.

We are grateful to the anonymous reviewers for suggestingimprovements in the analysis of certain results, especially therelationship between clustering coefficient and lifetime.

REFERENCES

[1] http://themetricsystem.rjmetrics.com/2010/01/26/new-data-on-twitters-users-and-engagement/. Online; accessed14-January-2011.

[2] C. Wilson, B. Boe, A. Sala, K. P. Puttaswamy, and B. Y. Zhao, “Userinteractions in social networks and their implications,” in Proceedingsof the 4th ACM European conference on Computer systems, EuroSys’09, (New York, NY, USA), pp. 205–218, ACM, 2009.

[3] S. Ye, J. Lang, and F. Wu, “Crawling online social graphs,” in WebConference (APWEB), 2010 12th International Asia-Pacific, pp. 236 –242, 2010.

[4] N. B. Ellison, C. Steinfield, and C. Lampe, “The benefits of facebookfriends: social capital and college students use of online social networksites,” Journal of Computer-Mediated Communication, vol. 12, no. 4,pp. 1143–1168, 2007.

[5] A. N. Joinson, “Looking at, looking up or keeping up with people?:motives and use of facebook,” in Proceeding of the twenty-sixth annualSIGCHI conference on Human factors in computing systems, CHI ’08,(New York, NY, USA), pp. 1027–1036, ACM, 2008.

[6] K. Dasgupta, R. Singh, B. Viswanathan, D. Chakraborty, S. Mukherjea,A. A. Nanavati, and A. Joshi, “Social ties and their relevance to churnin mobile telecom networks,” in Proceedings of the 11th internationalconference on Extending database technology: Advances in databasetechnology, EDBT ’08, (New York, NY, USA), pp. 668–677, ACM,2008.

[7] D. Stutzbach and R. Rejaie, “Understanding churn in peer-to-peernetworks,” in Proceedings of the 6th ACM SIGCOMM conference onInternet measurement, IMC ’06, (New York, NY, USA), pp. 189–202,ACM, 2006.

[8] B. Viswanath, A. Mislove, M. Cha, and K. P. Gummadi, “On theevolution of user interaction in facebook,” in Proceedings of the 2ndACM workshop on Online social networks, WOSN ’09, (New York,NY, USA), pp. 37–42, ACM, 2009.

[9] F. Benevenuto, T. Rodrigues, M. Cha, and V. Almeida, “Characterizinguser behavior in online social networks,” in Proceedings of the 9th ACMSIGCOMM conference on Internet measurement conference, IMC ’09,(New York, NY, USA), pp. 49–62, ACM, 2009.

[10] http://blog.nielsen.com/nielsenwire/online mobile/twitter-quitters-post-roadblock-to-long-term-growth/. Online; accessed14-January-2011.

[11] http://facebook.com/press/info.php?statistics. Online; accessed 14-January-2011.

[12] http://blog.nielsen.com/nielsenwire/online mobile/social-media-accounts-for-22-percent-of-time-online/. Online; accessed14-January-2011.

296