happiness and the patterns of life: a study of geolocated tweets

23
Happiness and the Patterns of Life: A Study of Geolocated Tweets Morgan R. Frank, Lewis Mitchell, Peter S. Dodds, Christopher M. Danforth * Computational Story Lab, Department of Mathematics and Statistics, Vermont Complex Systems Center, Vermont Advanced Computing Core, University of Vermont, Burlington, Vermont, United States of America * [email protected] (corresponding author) The patterns of life exhibited by large populations have been described and modeled both as a basic science exer- cise and for a range of applied goals such as reducing auto- motive congestion, improving disaster response, and even predicting the location of individuals. However, these stud- ies previously had limited access to conversation content, rendering changes in expression as a function of movement invisible. In addition, they typically use the communication between a mobile phone and its nearest antenna tower to infer position, limiting the spatial resolution of the data to the geographical region serviced by each cellphone tower. We use a collection of 37 million geolocated tweets to char- acterize the movement patterns of 180,000 individuals, tak- ing advantage of several orders of magnitude of increased spatial accuracy relative to previous work. Employing the recently developed sentiment analysis instrument known as the hedonometer, we characterize changes in word usage as a function of movement, and find that expressed happi- ness increases logarithmically with distance from an indi- vidual’s average location. A proper characterization of human mobility patterns [116] is an essential component in the development of models of urban planning [17], traffic forecasting [18], and the spread of diseases [1921]. In the modern communication era, pat- terns of human movement have been revealed at an increas- ingly higher resolution in both space and time, with mobile phone data in particular complementing existing survey-based investigations. As is the case with each new instrument mea- suring macroscale sociotechnical phenomena, the task has be- come one of understanding what discernible patterns exist, and what meaning can be derived from those patterns [3, 2224]. Scientists working to understand mobility have employed a diverse set of methodologies. Brockman et al. [7] used the cir- culation of nearly 1/2 million U.S. dollar bills whose locations were submitted by over 1 million visitors to a website [25] to demonstrate that bank note trajectories are superdiffusive in space and subdiffusive in time, i.e. moving farther and more frequently than expected. Gonzalez et al. [2] used 6 months of mobile phone data from 100,000 individuals to show that human trajectories are regular in space and time, with each individual having a high probabil- ity of returning to a few preferred locations according to Zipf’s law. Combining phone communication data with measures of community economic prosperity, Eagle et al. [3] showed that the diversity of contacts in an individual’s social network is strongly correlated to the potential for economic development exhibited by their community. Exemplifying recent work to characterize sentiment with social network communications, Mitchell et al. [26] combined traditional survey data (e.g., Gallup) with millions of tweets to correlate word usage with the demographic characteristics of U.S. urban areas. Expressed happiness was shown, for ex- ample, to correlate strongly with percentage of the population married, and anti-correlate with obesity. Words such as ‘Mc- Donalds’ and ‘hungry’ appeared far more frequently in obese cities, suggesting their instrument could be used to provide real-time feedback on social health programs such as the pro- posed ban on the sale of large sodas in New York City in 2013. In what follows, we characterize the pattern of life of over 180,000 individuals in the U.S. using messages sent via the social networking service Twitter, and employ our text-based hedonometer [27] to characterize sentiment as a function of movement. In the calendar year 2011, we collected roughly 4 billion messages through Twitter’s gardenhose feed, repre- senting a random 10% of all status updates posted during this period. Along with an abundance of other metadata, location in- formation typically accompanies each message, resulting from one of three mechanisms by which individuals can report their location when updating their status. First, when an individual registers their account with Twitter, they are presented with the opportunity to report their location in a free text box. This region will be displayed in their user profile (e.g. ‘NYC’ or ‘over the rainbow’). The metadata accompanying each tweet sent by the individual contains this self-reported location. Sec- ond, individuals submitting a message through a web browser can choose to tag their message with a ‘place’ chosen from a drop-down menu, where the first option provided is typically the city within which the computer’s IP address is found. For the purposes of accuracy, we have chosen to ignore each of these two mechanisms for reporting position when attempting to assign each tweet a geographical location, and focus instead on messages located via a third mechanism, namely the Global Positioning System (GPS). Individuals using an app provided by a mobile device may opt-in to geolocate their message, in which case the exact lat- itude and longitude of the mobile phone is reported. The ac- curacy of this information is governed by the precision of the GPS instrument embedded in the phone, which can vary de- pending on the surrounding topography. As a result of these factors, we are able to approximately place each geolocated message inside a 10 meter circle on the surface of the Earth, within which the tweet was sent. Approximately 1% of the status updates received through the gardenhose feed are ge- olocated, resulting in a total of 37 million messages, collec- tively representing more than 180,000 English-speaking peo- ple worldwide. Fig. 1 is representative of the geospatial reso- lution of the data. Results Following Gonz´ alez et al. [2], we examine the shape of hu- man mobility using radius of gyration, hereafter gyradius, as a measure of the linear size occupied by an individual’s trajec- tory. In Fig. 2, we investigate the geographical distribution of movement in four urban areas by plotting a dot for each tweet, colored by the gyradius of its author. Clockwise from the top left, cities are displayed in order of their apparent aggregate 1 arXiv:1304.1296v1 [physics.soc-ph] 4 Apr 2013

Upload: bert-kok

Post on 08-Nov-2014

372 views

Category:

Documents


0 download

DESCRIPTION

Happiness levels captured by Tweets rise logarithmically with distance from our average location, say computer scientists studying Twitter sentiment.

TRANSCRIPT

Page 1: Happiness and the Patterns of Life: A  Study of Geolocated Tweets

Happiness and the Patterns of Life: AStudy of Geolocated TweetsMorgan R. Frank, Lewis Mitchell, Peter S. Dodds,Christopher M. Danforth∗Computational Story Lab, Department of Mathematics and Statistics,Vermont Complex Systems Center, Vermont Advanced Computing Core,University of Vermont, Burlington, Vermont, United States of America∗[email protected] (corresponding author)

The patterns of life exhibited by large populations havebeen described and modeled both as a basic science exer-cise and for a range of applied goals such as reducing auto-motive congestion, improving disaster response, and evenpredicting the location of individuals. However, these stud-ies previously had limited access to conversation content,rendering changes in expression as a function of movementinvisible. In addition, they typically use the communicationbetween a mobile phone and its nearest antenna tower toinfer position, limiting the spatial resolution of the data tothe geographical region serviced by each cellphone tower.We use a collection of 37 million geolocated tweets to char-acterize the movement patterns of 180,000 individuals, tak-ing advantage of several orders of magnitude of increasedspatial accuracy relative to previous work. Employing therecently developed sentiment analysis instrument knownas the hedonometer, we characterize changes in word usageas a function of movement, and find that expressed happi-ness increases logarithmically with distance from an indi-vidual’s average location.

A proper characterization of human mobility patterns [1–16] is an essential component in the development of modelsof urban planning [17], traffic forecasting [18], and the spreadof diseases [19–21]. In the modern communication era, pat-terns of human movement have been revealed at an increas-ingly higher resolution in both space and time, with mobilephone data in particular complementing existing survey-basedinvestigations. As is the case with each new instrument mea-suring macroscale sociotechnical phenomena, the task has be-come one of understanding what discernible patterns exist, andwhat meaning can be derived from those patterns [3, 22–24].

Scientists working to understand mobility have employed adiverse set of methodologies. Brockman et al. [7] used the cir-culation of nearly 1/2 million U.S. dollar bills whose locationswere submitted by over 1 million visitors to a website [25] todemonstrate that bank note trajectories are superdiffusive inspace and subdiffusive in time, i.e. moving farther and morefrequently than expected.

Gonzalez et al. [2] used 6 months of mobile phone data from100,000 individuals to show that human trajectories are regularin space and time, with each individual having a high probabil-ity of returning to a few preferred locations according to Zipf’slaw. Combining phone communication data with measures ofcommunity economic prosperity, Eagle et al. [3] showed thatthe diversity of contacts in an individual’s social network isstrongly correlated to the potential for economic developmentexhibited by their community.

Exemplifying recent work to characterize sentiment withsocial network communications, Mitchell et al. [26] combined

traditional survey data (e.g., Gallup) with millions of tweetsto correlate word usage with the demographic characteristicsof U.S. urban areas. Expressed happiness was shown, for ex-ample, to correlate strongly with percentage of the populationmarried, and anti-correlate with obesity. Words such as ‘Mc-Donalds’ and ‘hungry’ appeared far more frequently in obesecities, suggesting their instrument could be used to providereal-time feedback on social health programs such as the pro-posed ban on the sale of large sodas in New York City in 2013.

In what follows, we characterize the pattern of life of over180,000 individuals in the U.S. using messages sent via thesocial networking service Twitter, and employ our text-basedhedonometer [27] to characterize sentiment as a function ofmovement. In the calendar year 2011, we collected roughly4 billion messages through Twitter’s gardenhose feed, repre-senting a random 10% of all status updates posted during thisperiod.

Along with an abundance of other metadata, location in-formation typically accompanies each message, resulting fromone of three mechanisms by which individuals can report theirlocation when updating their status. First, when an individualregisters their account with Twitter, they are presented withthe opportunity to report their location in a free text box. Thisregion will be displayed in their user profile (e.g. ‘NYC’ or‘over the rainbow’). The metadata accompanying each tweetsent by the individual contains this self-reported location. Sec-ond, individuals submitting a message through a web browsercan choose to tag their message with a ‘place’ chosen from adrop-down menu, where the first option provided is typicallythe city within which the computer’s IP address is found. Forthe purposes of accuracy, we have chosen to ignore each ofthese two mechanisms for reporting position when attemptingto assign each tweet a geographical location, and focus insteadon messages located via a third mechanism, namely the GlobalPositioning System (GPS).

Individuals using an app provided by a mobile device mayopt-in to geolocate their message, in which case the exact lat-itude and longitude of the mobile phone is reported. The ac-curacy of this information is governed by the precision of theGPS instrument embedded in the phone, which can vary de-pending on the surrounding topography. As a result of thesefactors, we are able to approximately place each geolocatedmessage inside a 10 meter circle on the surface of the Earth,within which the tweet was sent. Approximately 1% of thestatus updates received through the gardenhose feed are ge-olocated, resulting in a total of 37 million messages, collec-tively representing more than 180,000 English-speaking peo-ple worldwide. Fig. 1 is representative of the geospatial reso-lution of the data.Results

Following Gonzalez et al. [2], we examine the shape of hu-man mobility using radius of gyration, hereafter gyradius, asa measure of the linear size occupied by an individual’s trajec-tory. In Fig. 2, we investigate the geographical distribution ofmovement in four urban areas by plotting a dot for each tweet,colored by the gyradius of its author. Clockwise from the topleft, cities are displayed in order of their apparent aggregate

1

arX

iv:1

304.

1296

v1 [

phys

ics.

soc-

ph]

4 A

pr 2

013

Page 2: Happiness and the Patterns of Life: A  Study of Geolocated Tweets

Figu

re1:

Eac

hpo

intc

orre

spon

dsto

age

oloc

ated

twee

tpos

ted

in20

11.T

witt

erac

tivity

ism

osta

ppar

enti

nur

ban

area

s.N

ote

that

the

imag

eco

ntai

nsno

cart

ogra

phic

bord

ers,

sim

ply

asm

alld

otfo

reac

hm

essa

ge.L

egen

d:A

(U.S

.),B

(Was

hing

ton,

D.C

.),C

(Los

Ang

eles

,C.A

.),an

dD

(Ear

th).

2

Page 3: Happiness and the Patterns of Life: A  Study of Geolocated Tweets

Figure 2: (Color online) The gyradius, calculated for each individual, is shown for each tweet authored in four example cities.Reflecting the pattern of urban life, we find messages authored by large radius individuals to be more likely to appear in themain downtown area of each city, while messages authored by small radius individuals tend to appear outside of the main urbanarea. Histograms of gyradii for each city are shown in Fig. S1, along with tweet locations colored by distance from expectedlocation (Fig. S2). Note that higher resolution versions of the four panels above can be found online [28].

3

Page 4: Happiness and the Patterns of Life: A  Study of Geolocated Tweets

gyradius, with New York City seemingly exhibiting a smallerradius than the San Francisco Bay Area. Reflecting the patternof urban life, we find messages authored by large radius indi-viduals to be more likely to appear in the main downtown areaof each city, while messages authored by small radius individ-uals tend to appear in less densely populated areas. For ex-ample, in Chicago, many individuals writing from downtownexhibit an order of magnitude greater radius than individualsposting in areas outside of the city.

In the greater Los Angeles area, we see several clusters ofindividuals with larger radius in downtown Los Angeles, aswell as Long Beach, Santa Monica, and Disneyland in Ana-heim, while less densely populated areas are seen as smallerclusters exhibiting much smaller radii. The San Francisco BayArea is clearly revealed by individuals with large radius, mostnotably in San Francisco, and somewhat less so in Oaklandand San Jose. Outside of these cities, there are many suburbanareas revealed by individuals with large radius, e.g. Palo Alto.Tweets appearing in less densely populated Bay Area locationsare far more likely to be authored by large radius individu-als than those appearing in lower population areas elsewhere.This observation surely reflects the socio-economic and demo-graphic characteristics of individuals using Twitter in the BayArea, where the social network service was founded. Addi-tionally, it could reflect the presence of tourists who will typ-ically have a larger radius than someone who lives and worksin the Bay Area.

The main observation apparent in Fig. 2, namely that in-dividuals who move a lot tend to appear in areas of largepopulation density, is somewhat counterintuitive. Given theapparent economies of scale offered by living in a denselypopulated area, one might expect to observe the inverse rela-tionship, namely that people living in less densely populatedareas travel further, by necessity, to their place of employ-ment or grocery store, for example. Of course, these individ-uals with large radius could be tourists, or they could havea long commute. Looking at each point colored instead bydistance from expected location (Fig. S2)) we still see moreexaggerated segregation, with non-natives appearing predomi-nantly in cities, and native individuals tweeting in the suburbs.Looking at 500 cities in the U.S., we find a moderate corre-lation between the mean gyradius and city land area (Pearsonρ = 0.24, p = 2× 10−7); Fig. S3 and Table S1 show the topand bottom cities with respect to gyradii.

To investigate the shape of human mobility, we normalizeeach individual’s trajectory to a common reference frame (seeMethods). In Fig. 3, we plot a heat map of the probabilitydensity function of the normalized locations of all individuals.For the purposes of the discussion, we will refer to deviationsfrom an individual’s expected location in the normalized refer-ence frame as occurring in the directions north, south, east, andwest. Several features of the map reveal interesting patterns ofmovement. First, the overall west-to-east teardrop shape of thecontours demonstrates that people travel predominantly alongtheir principle axis, namely heading west from the origin alongy/σy = 0, with deviations in the orthogonal direction becomingshorter and less frequent as they move farther away from the

origin.Second, the appearance of two spatially distinct yellow re-

gions separated by a less populated green region suggests thatpeople spend the vast majority of their time near two loca-tions. We refer to these locations as the work and home habi-tats [8], where the home habitat is centered on the dark redregion roughly 1 standard deviation east of the origin, and thework habitat is centered approximately 2 standard deviationswest of the origin. These locations highlight the bimodal dis-tribution of principal axis corridor messages (Fig. 4A).

Finally, a clear asymmetry is observed about the x/σx = 0axis indicating the increasingly isotropic variation in move-ment surrounding the home habitat, as compared to the workhabitat. We interpret this to be a reflection of the tendency tobe more familiar with the surroundings of one’s home, and toexplore these surroundings in a more social context (Fig. 4B).The symmetry observed when reflecting about the y/σy = 0-axis is strong, demonstrating the remarkable consistency of themovement patterns revealed by the data.

In an effort to characterize the temporal and spatial struc-ture observed in Fig. 3, in Fig. 5 we examine locations fre-quently visited by the most prolific members of our data set,namely the roughly 300 individuals for whom we received atleast 800 geolocated messages. We suspect that these individ-uals enabled the geolocating feature to be on by default forall messages, as implied by the roughly O(104) geolocatedmessages suggested by the gardenhose rate. The main fig-ure shows the probability of tweeting from each habitat, withhabitats ordered by rank, for each individual [8]. We find thatP(H(a)

i ) ∝ R(H(a)i )−1.3 which is approximately a Zipf distribu-

tion [29]. This finding indicates that regardless of the num-ber of tweet habitats for a given individual, the majority oftheir messaging activity occurs in one of only a few habitats,with the probability decaying at a predictable rate. If the decaywere Zipfian, an individual is approximately n-times as likelyto tweet from their mode location than from their rank n loca-tion. With our slope being steeper, these probabilities fall ata faster rate with rank. Note that fitting the power law modelto the leading 10 habitats, using only individuals who have atleast 10 habitats, we also get a slope of −1.3.

For roughly 95% of these individuals, each tweet has agreater than 10% chance of being authored from their mode lo-cation (Fig. 5B). Fig. 5C demonstrates each individual’s like-lihood of authoring messages from their mode location (blackcurve) at different times of day throughout the week. A period-2 cycle is observed for each day of the week. Maxima are seenin the morning (8-10am) and evening (10pm-midnight), andminima in the afternoon (2-4pm) and overnight (2-4am) hours.The peak in the morning is consistently higher than that in theevening, and the afternoon valley is consistently lower than theovernight valley. The cycle is somewhat less structured on theweekend. Also plotted are the probabilities of tweeting fromlocations other than the mode (red curve). While the shapeis quite similar to the mode location probability, we do notethat individuals tweeting at 2am are likely to be anywhere buthome.

In a study performed with cellphone tower data, Gonzalez

4

Page 5: Happiness and the Patterns of Life: A  Study of Geolocated Tweets

−20 −15 −10 −5 0 5 10 −8

−6

−4

−2

0

2

4

6

8

x/mx

y/m

y

−6

−5.5

−5

−4.5

−4

−3.5

−3

−2.5

Figure 3: (Color online) The probability density function of observing an individual in their normalized reference frame, wherethe origin corresponds to each individual’s expected location, and σy = 0 corresponds to their principle axis. This map showsthe positions of over 37,000 individuals, each with more than 50 locations, in their intrinsic reference frame.

5

Page 6: Happiness and the Patterns of Life: A  Study of Geolocated Tweets

−15 −9 −3 0 3 9 150

1

2

3

4

5

6

x/mx

Log−

10 C

ount

s

0 1 2 3 40.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

Log−10 Radius of Gyration (km)

my/m

x

Slope=−0.071

A

B

Figure 5: Looking at messages authored in the principle axis corridor, defined by | ysy

| < 301000 , �15 x

sx 15, we

observe a clear separation between the most likely and second most likely position (A). The distribution is skewed left,with movement in a heading opposite an individual’s work/home corridor observed to be highly unlikely. In addition,due to the normalization, we see that individuals are much more likely to tweet slightly east of their expected locationthan slightly west. The isotropy ratio (B) measures the change in the density’s shape as a function of gyradius, withlarge radius individuals exhibiting a less circular pattern of life. Standard errors are plotted, but are only visible for thelargest radius group. The isotropy ratio decays logarithmically with radius.

as smaller clusters exhibiting much smaller radii. The SanFrancisco Bay Area is clearly revealed by individuals withlarge radius, most notably in San Francisco, and some-what less so in Oakland and San Jose. Outside of thesecities, there are many suburban areas revealed by individ-uals with large radius, e.g. Palo Alto. Tweets appear-ing in less densely populated Bay Area locations are farmore likely to be authored by large radius individuals thanthose appearing in lower population areas elsewhere. Thisobservation surely reflects the socio-economic and demo-graphic characteristics of individuals using Twitter in theBay Area, where the social network service was founded.Additionally, it could reflect the presence of tourists whowill typically have a larger radius than someone who livesand works in the Bay Area. Note that the relationship be-tween gyradius and average word happiness for each ofthese cities is presented in the Appendix, and higher res-olution versions of the four panels above can be foundonline2.

The main observation apparent in Figure 2, namelythat individuals who move a lot tend to appear in areasof large population density, is somewhat counterintuitive.Given the apparent economies of scale offered by livingin a densely populated area, one might expect to observe

2http://www.uvm.edu/storylab/share/papers/

frank2013a

the inverse relationship, namely that people living in lessdensely populated areas travel further, by necessity, totheir place of employment or grocery store, for example.Of course, these individuals with large radius could betourists, or they could have a long commute. Looking ateach point colored instead by distance from expected lo-cation (Appendix), we still see more exaggerated segrega-tion, with non-natives appearing predominantly in cities,and native individuals tweeting in the suburbs.

In Figure 3, we plot the mean gyradius vs land areafor each city in the U.S. as defined by [30]. While thereis considerable scatter among the cities, we do observe aweak correlation indicating that individuals living in morepopulated and larger areas travel farther. Table 3 showsthe top and bottom cities with respect to mean gyradius,as well as the four cities discussed above.

In Figure 4, we plot a heat map of the probability den-sity function of the normalized locations of all individu-als. For the purposes of the discussion, we will refer todeviations from an individual’s expected location in thenormalized reference frame as occurring in the directionsnorth, south, east, and west.

Several features of the map reveal interesting patternsof movement. First, the overall west-to-east teardropshape of the contours demonstrates that people travel pre-dominantly along their principle axis, namely heading

8

Figure 4: Looking at messages authored in the principle axis corridor, defined by | yσy| < 30

1000 , we observe a clear separationbetween the most likely and second most likely position (A). The distribution is skewed left, with movement in a headingopposite an individual’s work/home corridor observed to be highly unlikely. In addition, due to the normalization, we see thatindividuals are much more likely to tweet slightly east of their expected location than slightly west. The isotropy ratio (B) mea-sures the change in the density’s shape as a function of gyradius, with large radius individuals exhibiting a less circular patternof life. Standard errors are plotted, but are only visible for the largest radius group. The isotropy ratio decays logarithmicallywith radius.

et al. [2] found that people spend most of their time in two lo-cations, and a person’s probability of being found at a separatelocation diminishes rapidly with rank by visitation. While ourinvestigation reveals a similar pattern, we find a larger differ-ence in the probability that an individual is tweeting from thehome habitat than from the work habitat. We attribute theseslight differences in our results to the different spatiotempo-ral precision of location data, as well as differences in activi-ties represented by the data. Gonzalez et al. determined eachindividual’s location by continuously monitoring the nearestcellphone tower whose range they were within. As such, wereceive more precise location information, but only when in-dividuals performed the act of tweeting.

One major advantage of using Twitter data to study move-ment is the additional source of information provided by themessages themselves. Researchers using mobile phone data tocharacterize mobility patterns do not have access to conversa-tions occurring during the time period of interest. To measurethe sentiment associated with different patterns of movement,we use the hedonometer introduced by Dodds et al. in [27].The instrument performs a context-free measurement of thehappiness of a large collection of words using the language as-sessment by Mechanical Turk (labMT) word list, as describedin [27, 30]. LabMT comprises roughly 10,000 of the most fre-quently used words in the English language, each of whichwas scored for happiness on a scale of 1 (sad) to 9 (happy) bypeople using Amazon’s Mechanical Turk service [31, 32], re-sulting in an average happiness score for each word. Exampleword scores are shown in Table 1.

To examine the relationship between movement and happi-ness, we calculate expressed happiness as a function of dis-tance from an individual’s expected location, as well as gy-radius. For the former, we grouped tweets into ten equallypopulated bins, with each group containing more than 500,000tweets from similar distances. The happiness of each groupwas then computed using Eqn 3, where all words written froma given distance were gathered into a single bin. For the latter,we placed individuals into ten equally sized groups by gyra-dius, with each group containing more than 10,000 individualswith similar gyradii.

Fig. 6 plots average word happiness against the distancefrom expected location (A), and gyradius (B). Starting with lo-cation, we find that tweets written close to an individual’s cen-ter of mass are slightly happier than those written 1km away.The least happy words, on average, are used at a distance rep-resentative of a short daily commute to work. Beyond thisleast happy distance, remarkably we find that happiness in-creases logarithmically with distance from expected location.Perhaps even more remarkably, we find an almost identicaltrend when grouping together individuals rather than tweets,observing that happiness also increases logarithmically withgyradius. Individuals with a large radius use happier wordsthan those with a smaller pattern of life. We find the trend ob-served in Fig. 6 holds for 3 of the 4 urban areas (Los Angeles,San Francisco, and Chicago), see Figs. S4, S5.

To explain the difference in expressed happiness exhibitedby different mobility groups, we turn to word shift graphs inFig. 7. Word shift graphs were introduced by Dodds and

6

Page 7: Happiness and the Patterns of Life: A  Study of Geolocated Tweets

1 2 5 10 20 30−3

−2.5

−2

−1.5

−1

−0.5

0

Rank

Log−

10 P

roba

bilit

y

Slope =−1.269

−2 −1.5 −1 −0.5 00

20

40

60

80

100

Freq

uenc

y

Log−10 Probability Rank 1 Tweet Hubs

A

B

Figure 6: Representing the approximately 300 individuals for whom we have at least 800 geolocated messages, weplot the probability of tweeting from a habitat as a function of the tweet habitat rank (A). Each dot represents a singleindividual’s likelihood of tweeting from one of their habitats. The axes are logarithmic, revealing an approximateZipfian distribution with slope -1.3 [43]. (B) Distribution of the rank-1 habitat, each individual’s mode location.

rank radius (km) city1 200.6 Martinsville, VA2 124.5 Middletown, OH3 112.3 Elkhart, IN4 98.8 Pottstown, PA5 96.6 Decatur, IL· · · · · · · · ·215 13.3 New York City, NY247 11.4 Chicago, IL300 8.94 Los Angeles, CA387 4.33 San Francisco & Oakland, CA· · · · · · · · ·468 0.492 Greenville, MS469 0.491 Athens, OH470 0.465 Key West, FL471 0.381 El Centro Calexico, CA472 0.312 Pullman, WA

Table 2: Top and Bottom 5 cities with respect to meangyradius, along with examples from Figure 2.

west from the origin along y/sy = 0, with deviations inthe orthogonal direction becoming shorter and less fre-quent as they move farther away from the origin.

Second, the appearance of two spatially distinct yellowregions separated by a less populated green region sug-gests that people spend the vast majority of their time neartwo locations. We refer to these locations as the work andhome habitats, where the home habitat is centered on thedark red region roughly 1 standard deviation east of theorigin, and the work habitat is centered approximately 2standard deviations west of the origin. These locationshighlight the bimodal distribution of principal axis corri-dor messages (Figure 5A).

Finally, a clear asymmetry is observed about the x/sx =0 axis indicating the increasingly isotropic variation inmovement surrounding the home habitat, as compared tothe work habitat. We suspect this to be a reflection ofthe tendency to be more familiar with the surroundings ofone’s home, and to explore these surroundings in a moresocial context (Figure 5B). The symmetry observed whenreflecting about the y/sy = 0-axis is strong, demonstrat-ing the remarkable consistency of the movement patternsrevealed by the data.

In an effort to characterize the temporal and spatialstructure observed in Figure 4, in Figure 6 we examinelocations frequently visited by the most prolific membersof our data set, namely the roughly 300 individuals forwhom we received at least 800 geolocated messages. We

9

M T W Th F S Su2

3

4

5

6

7

8

9

10 x 10−3

Day

Prob

abilit

y

ModeOther

C

Figure 5: Above

1 10 100 10005.95

6

6.05

6.1

6.15

Distance (km) from Expected Location

Aver

age

Wor

d H

appi

ness

1 10 100 10005.95

6

6.05

6.1

6.15

User Radius of Gyration (km)

Aver

age

Wor

d H

appi

ness

A B

Figure 8: Tweets are grouped into ten equally populated bins by the distance from their author’s expected location, andthe average happiness of words written at each distance is plotted (A). Expressed happiness grows logarithmically withdistance from expected location. A similar trend is observed when individuals are grouped into ten equally populatedbins by their gyradius, and all words authored by individuals in each bin are gathered (B). These observed trendspersist through variation in binning and different measures of mobility.

radii.Figure 8 plots average word happiness against the dis-

tance from expected location (A), and gyradius (B). Start-ing with location, we find that tweets written close to anindividual’s center of mass are slightly happier than thosewritten 1km away. The least happy words, on average, areused at a distance representative of a short daily commuteto work. Beyond this least happy distance, remarkably wefind that happiness increases logarithmically with distancefrom expected location. Perhaps even more remarkably,we find an almost identical trend when grouping togetherindividuals rather than tweets, observing that happinessalso increases logarithmically with gyradius. Individualswith a large radius use happier words than those with asmaller pattern of life. We find the trend observed in Fig-ure 8 holds for 3 of the 4 urban areas (Los Angeles, SanFrancisco, and Chicago) visualized in Figure 2 (see Ap-pendix).

To explain the difference in expressed happinessexhibited by different mobility groups, we turn to wordshift graphs in Figure 9. Word shift graphs were intro-duced by Dodds and Danforth [18, 19] as a means forinvestigating the elements of language responsible forhappiness differences between two large texts. As anexample, consider the difference between tweets authoredat distances of roughly 1km and 2500km away from anindividual’s expected location. The average happinessscores for these two distances are havg = 5.96 andhavg = 6.13 respectively. Individual word contributions

to this difference are shown in Figure 9A, and can bedescribed as follows.

Words appearing on the right increase the happiness ofthe 2500km distance relative 1km distance. For example,tweets authored far from an individual’s expected locationare more likely to contain the positive words ‘beach’,‘new’, ‘great’, ‘park’, ‘restaurant’, ‘dinner’, ‘resort’,‘coffee’, ‘lunch’, ‘cafe’, and ‘food’, and less likely tocontain the negative words ‘no’, ‘don’t’, ‘not’, ‘hate’,‘can’t’, ‘damn’, and ‘never’ than tweets posted closeto home. Words going against the trend appear on theleft, decreasing the happiness of the 2500km distancegroup relative to the 1km group. Tweets close to homeare more likely to contain the positive words ‘me’, ‘lol’,‘love’, ‘like’, ‘haha’, ‘my’, ‘you’, and ‘good’. Movingclockwise, the three insets in Figure 9A show that thetwo text sizes are comparable, the biggest contributor tothe happiness difference is the decrease in negative wordsauthored by individuals very far from their expectedlocation, and the 50 words listed make up roughly 50%of the total difference between the two bags of words.

Note that the relatively small differences in havgscores reflect a small signal, yet one that we have shownpreviously can be resolved by our hedonometer [19].Additional word shift comparisons for the four urbanareas investigated earlier are provided in the Appendix.

Looking at the word differences between individualswith large and small radii of gyration in Figure 9B, wesee that individuals in the large radius group author the

11

Figure 6: Above

10

Figure 5: Representing the approximately 300 individuals for whom we have at least 800 geolocated messages, we plot theprobability of tweeting from a habitat as a function of the tweet habitat rank (A). Each dot represents a single individual’slikelihood of tweeting from one of their habitats. The axes are logarithmic, revealing an approximate Zipfian distribution withslope -1.3 [29]. (B) Distribution of the rank-1 habitat, each individual’s mode location. (C) A robust diurnal cycle is observedin the hourly time of day at which statuses are updated, with those from the mode location (black curve) occurring more oftenthan other locations (red curve) in the morning and evening. Probabilities sum to 1 for each curve, with bins for each hour.Dashed vertical lines denote midnight.

7

Page 8: Happiness and the Patterns of Life: A  Study of Geolocated Tweets

1 10 100 10005.95

6

6.05

6.1

6.15

Distance (km) from Expected Location

Aver

age

Wor

d H

appi

ness

1 10 100 10005.95

6

6.05

6.1

6.15

User Radius of Gyration (km)

Aver

age

Wor

d H

appi

ness

A B

Figure 8: Tweets are grouped into ten equally populated bins by the distance from their author’s expected location, andthe average happiness of words written at each distance is plotted (A). Expressed happiness grows logarithmically withdistance from expected location. A similar trend is observed when individuals are grouped into ten equally populatedbins by their gyradius, and all words authored by individuals in each bin are gathered (B). These observed trendspersist through variation in binning and different measures of mobility.

radii.Figure 8 plots average word happiness against the dis-

tance from expected location (A), and gyradius (B). Start-ing with location, we find that tweets written close to anindividual’s center of mass are slightly happier than thosewritten 1km away. The least happy words, on average, areused at a distance representative of a short daily commuteto work. Beyond this least happy distance, remarkably wefind that happiness increases logarithmically with distancefrom expected location. Perhaps even more remarkably,we find an almost identical trend when grouping togetherindividuals rather than tweets, observing that happinessalso increases logarithmically with gyradius. Individualswith a large radius use happier words than those with asmaller pattern of life. We find the trend observed in Fig-ure 8 holds for 3 of the 4 urban areas (Los Angeles, SanFrancisco, and Chicago) visualized in Figure 2 (see Ap-pendix).

To explain the difference in expressed happinessexhibited by different mobility groups, we turn to wordshift graphs in Figure 9. Word shift graphs were intro-duced by Dodds and Danforth [18, 19] as a means forinvestigating the elements of language responsible forhappiness differences between two large texts. As anexample, consider the difference between tweets authoredat distances of roughly 1km and 2500km away from anindividual’s expected location. The average happinessscores for these two distances are havg = 5.96 andhavg = 6.13 respectively. Individual word contributions

to this difference are shown in Figure 9A, and can bedescribed as follows.

Words appearing on the right increase the happiness ofthe 2500km distance relative 1km distance. For example,tweets authored far from an individual’s expected locationare more likely to contain the positive words ‘beach’,‘new’, ‘great’, ‘park’, ‘restaurant’, ‘dinner’, ‘resort’,‘coffee’, ‘lunch’, ‘cafe’, and ‘food’, and less likely tocontain the negative words ‘no’, ‘don’t’, ‘not’, ‘hate’,‘can’t’, ‘damn’, and ‘never’ than tweets posted closeto home. Words going against the trend appear on theleft, decreasing the happiness of the 2500km distancegroup relative to the 1km group. Tweets close to homeare more likely to contain the positive words ‘me’, ‘lol’,‘love’, ‘like’, ‘haha’, ‘my’, ‘you’, and ‘good’. Movingclockwise, the three insets in Figure 9A show that thetwo text sizes are comparable, the biggest contributor tothe happiness difference is the decrease in negative wordsauthored by individuals very far from their expectedlocation, and the 50 words listed make up roughly 50%of the total difference between the two bags of words.

Note that the relatively small differences in havgscores reflect a small signal, yet one that we have shownpreviously can be resolved by our hedonometer [19].Additional word shift comparisons for the four urbanareas investigated earlier are provided in the Appendix.

Looking at the word differences between individualswith large and small radii of gyration in Figure 9B, wesee that individuals in the large radius group author the

11

Figure 6: Tweets are grouped into ten equally populated bins by the distance from their author’s expected location, and theaverage happiness of words written at each distance is plotted (A). Expressed happiness grows logarithmically with distancefrom expected location. A similar trend is observed when individuals are grouped into ten equally populated bins by theirgyradius, and all words authored by individuals in each bin are gathered (B). These observed trends persist through variation inbinning and different measures of mobility.

Danforth [27, 33] as a means for investigating the elements oflanguage responsible for happiness differences between twolarge texts. As an example, consider the difference betweentweets authored at distances of roughly 1km and 2500km awayfrom an individual’s expected location. The average happinessscores for these two distances are havg = 5.96 and havg = 6.13respectively. Individual word contributions to this differenceare shown in Fig. 7A, and can be described as follows.

Words appearing on the right increase the happiness of the2500km distance relative 1km distance. For example, tweetsauthored far from an individual’s expected location are morelikely to contain the positive words ‘beach’, ‘new’, ‘great’,‘park’, ‘restaurant’, ‘dinner’, ‘resort’, ‘coffee’, ‘lunch’, ‘cafe’,and ‘food’, and less likely to contain the negative words ‘no’,‘don’t’, ‘not’, ‘hate’, ‘can’t’, ‘damn’, and ‘never’ than tweetsposted close to home. Words going against the trend appearon the left, decreasing the happiness of the 2500km distancegroup relative to the 1km group. Tweets close to home aremore likely to contain the positive words ‘me’, ‘lol’, ‘love’,‘like’, ‘haha’, ‘my’, ‘you’, and ‘good’. Moving clockwise, thethree insets in Fig. 7A show that the two text sizes are com-parable, the biggest contributor to the happiness difference isthe decrease in negative words authored by individuals veryfar from their expected location, and the 50 words listed makeup roughly 50% of the total difference between the two bagsof words.

Note that the relatively small differences in havg scores re-flect a small signal, yet one that we have shown previously canbe resolved by our hedonometer [27]. Additional word shiftcomparisons for the four urban areas investigated earlier areprovided in the Supplemental Material, Figs. S6, S7.

Looking at the word differences between individuals withlargest and smallest radii of gyration in Fig. 7B, we see thatindividuals in the large radius group author the negative words‘hate’, ‘damn’, ‘dont’, ‘mad’, ‘never’, ‘not’ and assorted pro-fanity less frequently, and the positive words ‘great’, ‘new’,‘dinner’, ‘hahaha’, and ‘lunch’ more frequently than the smallradius group. Going against the trend, the large radius groupuses the positive words ‘me’, ‘lol’, ‘love’, ‘like’, ‘funny’,‘girl’, and ‘my’ less frequently, and the negative words ‘no’,and ‘last’ more frequently. Comparing with other groups, thelarge radius group authors an increased frequency of words inreference to eating, like the words ‘dinner’, ‘lunch’, ‘restau-rant’, and ‘food’, and make less reference to traffic conges-tion.

Comparing the two figures, we note that individuals withlarge radius laugh more (e.g ‘hahaha’) than those with a smallradius, but individuals closer to their expected location laughmore than those far from home.

These word differences reveal the relationship between anindividual’s pattern of movement and their experiences. It isnot surprising to observe regular international travelers tweet-ing about the food they enjoy on vacation. Indeed, we expectthat individuals capable of tweeting at a great distance fromtheir expected location are more likely to benefit from an ad-vantaged socioeconomic status, which they happily update fre-quently. Previous work has demonstrated that expressed hap-piness correlates strongly with many socioeconomic indica-tors [26]. Nevertheless, setting aside these luxurious words,we still see a general decline in the use of negative words asindividuals travel farther from their expected location. In fact,of the four contributions to the difference in happiness between

8

Page 9: Happiness and the Patterns of Life: A  Study of Geolocated Tweets

−10 −5 0 5 10

1

5

10

15

20

25

30

35

40

45

50

no −?shit −?

+?me

+?love

+?lol

beach +B

don’t −?new +B

+?haha

−Bterminal

+?like

not −?hate −?

international +B

die −?can’t −?

+?you

park +B

great +B

bitch −?ass −?damn −?never −?nigga −?restaurant +B

con −?dont −?dinner +B

bad −?hell −?home +B

coffee +B

+?my

lunch +B

+?hahaha

ain’t −?cafe +B

resort +B

+?good

food +B

sick −?bored −?mad −?niggas −?tired −?

−Bwaiting

−Bearthquake

+?sleep

+?bed

hotel +B

Per word average happiness shift bhavg, r

(%)

Wo

rd r

an

k r

Tref

: Distance from Expected Location 1.08 (km) (havg

=5.96)

Tcomp

: Distance from Expected Location 2638.95 (km) (havg

=6.13)

Text size:T

ref T

comp

+? +B

−B −?

Balance:

−80 : +180

0 100

100

101

102

103

104

Yi=1

r bh

avg, i

A

−20 −10 0 10 20

1

5

10

15

20

25

30

35

40

45

50

shit −?+?me

+?lolass −?

+?lovehate −?

+?likegreat +B

−Bnonigga −?bitch −?

damn −?new +Bdont −?mad −?dinner +Bnever −?not −?hahaha +Blunch +Btoday +Bhell −?gone −?niggas −?home +Bdie −?awesome +Bbad −?

−Bconbitches −?food +Brestaurant +Bbored −?tired −?ill −?

+?funnyhurt −?

+?girlphoto +Bugly −?

+?my−Bfalling

stupid −?+?sleep

sick −?trip +Bcant −?mean −?

−Blastfuckin −?

Per word average happiness shift bhavg, r

(%)

Wor

d ra

nk r

Tref: User Radius of Gyration 3.11 (km) (havg=6.01)Tcomp: User Radius of Gyration 3188.08 (km) (havg=6.10)

Text size:T

ref T

comp

+? +B

−B −?

Balance:

−113 : +213

0 100

100

101

102

103

104

Yi=1r bh

avg, i

B

Figure 9: (Color online) Word shift graphs comparing (A) the lowest average word happiness distance from homegroup to the words authored farthest from home, which also has the largest average word happiness and (B) thesmallest gyradius group with the largest gyradius group. The words in the word shifts from top to bottom appearin decreasing order of ranked percentage contribution to the overall average happiness difference (Dhavg) of the twotexts being compared. The +/- symbols indicate whether the word has an average happiness score that is happy or sadrelative to the entire text Tre f . The symbols " / # indicate whether a word was used more or less in Tcomp relative tousage in Tre f . The left inset panel shows how the ranked top contributing words to Dhavg combine in sum. The fourcircles in the lower right show the total contribution of the four word types (+ ",� ",+ #,� #). Relative text size isrepresented by the grey squares. See [19] for further details and examples of word shift graphs.

12

Figure 7: (Color online) Word shift graphs comparing (A) the lowest average word happiness distance from home group tothe words authored farthest from home, which also has the largest average word happiness and (B) the smallest gyradiusgroup with the largest gyradius group. The words in the word shifts from top to bottom appear in decreasing order of rankedpercentage contribution to the overall average happiness difference (∆havg) of the two texts being compared. The +/- symbolsindicate whether the word has an average happiness score that is happy or sad relative to the entire text Tre f . The symbols ↑ / ↓indicate whether a word was used more or less in Tcomp relative to usage in Tre f . The left inset panel shows how the ranked topcontributing words to ∆havg combine in sum. The four circles in the lower right show the total contribution of the four wordtypes (+ ↑,− ↑,+ ↓,− ↓). Relative text size is represented by the grey squares. See [27] for further details and examples ofword shift graphs.

9

Page 10: Happiness and the Patterns of Life: A  Study of Geolocated Tweets

words authored close to home vs. far from home, this declinein negative words when far from home is the largest compo-nent (bottom right inset, Fig. 7).Discussion

Using 37 million geolocated tweets authored in 2011, thisstudy characterizes the pattern of life of over 180,000 individ-uals in the United States. While observed mobility patternsagree qualitatively with previous work investigating cellphonedata [2], we are able to connect movement patterns to changesin word usage for the first time. Our main finding is that ex-pressed happiness increases logarithmically with both distancefrom expected location and gyradius, largely because individ-uals who travel farther use positive, food related words morefrequently, and negative words and profanity less frequently.

Several methodological issues are raised by the use of Twit-ter messages to characterize mobility and happiness. Consid-ering Twitter as a source, we note that according to the PewInternet & American Life Project, roughly 15% of adults in theU.S. were actively using Twitter at the end of 2011 [34]. Whilethis fraction represents a substantial group of Americans, wehave no data to quantify the demographic group representedby the subset of these 15% who specifically choose to geotaga large percentage of their messages. Nevertheless, since wethreshold the sample to include individuals who have geolo-cated more than approximately 300 of their messages in 2011,we suspect that the large majority of individuals represented inour study regularly do so as a matter of daily life, as opposedto geolocating messages only when encountering a novel ex-perience such as a vacation.

Regarding word usage as a proxy for happiness, accessingthe internal emotional state of individuals is beyond the scopeof our instrument. We do believe however, that when aggre-gated, the words used by large groups of individuals reflecttheir culture in ways not captured by surveys or self-report.Indeed, we see the hedonometer as complementing more tra-ditional economic methods for characterizing economic andsocietal health, such as the Gross Domestic Product or Con-sumer Confidence Index. Using the same collection of geolo-cated messages explored here, the hedonometer was recentlyemployed by Mitchell et al. [26] to characterize trends in wordusage for cities. Expressed happiness was shown to correlateto hundreds of demographic, socio-economic, and health mea-sures, with interactive evidence available in the article’s onlineAppendix [35].

This work contributes to a growing body of literature aimedat observing, describing, modeling, and ultimately explain-ing the spatiotemporal dynamics of large-scale socio-technicalsystems. Natural extensions of this work might combine topo-logical measures of network interactions with geospatial datato predict the likelihood of new links appearing in a social net-work [36], or to measure the spread of emotions through ge-ographical and topological space [37]. The mobility patternsdescribed here could also be combined with more traditionalsurveys (e.g. census data) to inform public policy regardingmany important issues, for example relating to the ‘obesityepidemic’ and changes in word usage at the level of individualneighborhoods targeted by public health campaigns.

MethodsIn an effort at quality control for the geolocated messages,

we identified and removed messages posted by robotic ac-counts and programmed tweeting services designed to au-tomatically send tweets typically not reflecting informationabout human activity. Preliminary analyses revealed a notice-able presence of bots posting geolocated messages referring toweather, earthquakes, traffic, and coupons. We identified andignored tweets collected from individuals for whom at leasthalf of their tweets contained any of the words ‘pressure’, ‘hu-mid’, ‘humidity’, ‘earthquake’, ‘traffic’ or ‘coupon’.

Messages referencing Foursquare check-ins (typically of theform ‘I’m at starbucks http://4sq.com/qrel9g’) were retainedfor the purpose of characterizing the mobility profile of eachindividual. However, for results involving happiness, we ig-nored Foursquare check-in tweets as their content is unlikelyto directly reflect sentiment.

Finally, to ensure that individual movement profiles arebased on a reasonably sized collection of locations, for thisstudy we focus on individuals for whom we have at least 30geolocated tweets. Given the uniformity of the random sampleprovided by the gardenhose, we can assume these individualsgeolocated a minimum of approximately 300 status updates in2011.

For reasons of privacy, we ignored all user specific in-formation including individual names. In addition, wherethe trajectories traced out by specific individuals are vi-sualized, we obscured the coordinate system of reference.Tweets were assigned to urban areas as defined by the 2010United States Census Bureaus MAF/TIGER (Master AddressFile/Topologically Integrated Geographic Encoding and Ref-erencing) database [38].

The gyradius for individual a is defined as

r(a) =

√√√√ 1N(a)

N(a)

∑i=1

(~p(a)i −〈~p(a)〉)2 (1)

where the two-dimensional vector ~p(a)i is the ith position inthe trajectory of individual a, given by the geolocation of thatindividual’s ith tweet, as observed in our database. N(a) isthe total number of tweets from individual a, and 〈~p(a)〉 =1/N(a) ∑N(a)

i=1 ~p(a)i is the center of mass of their trajectory, whichwe denote their expected location. Note that if we considereach message to be a prediction of an individual’s location,then the gyradius is in fact the root mean square error (RMSE)of that prediction. Fig. S8 plots the Complementary Cumu-lative Distribution Function (CCDF) of the gyradii of all indi-viduals.

To compare the shape of individual trajectories, we normal-ize for both differences in gyradius and direction of trajec-tory. Considering each individual’s trajectory as a set of (x,y)-pairs {(x1,y1),(x2,y2), . . . ,(xN ,yN)}, we calculate the two di-mensional matrix known as the tensor of inertia, consideringeach point in a individual’s trajectory as an equally weightedmass at location (xi,yi). We then find this tensor’s eigenvec-tors and eigenvalues. The eigenvector corresponding to the

10

Page 11: Happiness and the Patterns of Life: A  Study of Geolocated Tweets

largest eigenvalue represents the axis along which most of theindividual’s trajectory occurs (hereafter called the individual’sprincipal axis). Previous work has demonstrated that for mostindividuals, this axis is parallel to the corridor between theirwork location and home [2, 4].

To normalize the different compass orientations of individ-ual trajectories, we rotate the coordinate system of each in-dividual so that their principal axis points due west. The ex-pected location for each individual (x, y) is then used to trans-late their position vector, i.e. (xi− x,yi− y), to ensure that theshape of each individual’s trajectory is in a common frame ofreference. However, the distances travelled by each individualvary widely despite their shared orientation (e.g. pedestrianvs. airline commute). In order to compare these trajectories,we calculate the standard deviation σx, σy for a given individ-ual’s trajectory, and divide their x- and y-coordinates by σx andσy, respectively. For more information about this process, in-cluding a pair of example trajectory normalizations, see Figs.S9-S13.

In an attempt to characterize time spent in each location, wedefine the ith tweet habitat for individual a, denoted H(a)

i , tobe a circle within which individual a posted at least 10 mes-sages [8]. The center of the circle is defined by the averageposition of all messages appearing in the habitat, and the ra-dius of the circle is chosen such that each tweet posted withina habitat is at most 100 meters away from the center, and nohabitats overlap. To measure the importance of habitat i toindividual a, we count the number of messages appearing ineach tweet habitat and produce the ranking R(H(a)

i ) for indi-vidual a. The probability that individual a tweets from habitatH(a)

i is

P(H(a)i ) =

|H(a)i |

N(a)(2)

where |H(a)i | is the number of tweet locations contained in

H(a)i . Notice that the habitat probabilities for individual a may

not sum to one since it may be the case that individual a hastweet locations that are not contained in a tweet habitat. Here-after, we will refer to an individual’s most frequently visited,or rank-1 habitat, as their mode location.

Using the labMT scores [27], we determine the average hap-piness (havg) of a given text T containing N unique words by

havg(T ) =∑N

i=1 havg(wi) · fi

∑Ni=1 fi

=N

∑i=1

havg(wi) · pi (3)

where fi is the frequency with which the ith word wi, forwhich we have an average word happiness score havg(wi),occurred in text T . The normalized frequency of wi is thengiven by pi = fi/∑N

i=1 fi.The hedonometer instrument can be tuned to emphasize the

most emotionally charged words by removing words within∆havg of the neutral score of havg = 5. It has been shown thatignoring these neutral words with 4 < havg(wi)< 6 provides agood balance of sensitivity and robustness, and thus we chose∆havg = 1 for this study [27].

word havg(wi)‘happy’ 8.30‘hahaha’ 7.94‘fresh’ 7.26

‘cherry’ 7.04‘pancake’ 6.96

‘piano’ 6.94‘and’ 5.22‘the’ 4.98‘of’ 4.94

‘down’ 3.66‘worse’ 2.70‘crash’ 2.60

‘:(’ 2.36‘war’ 1.80‘jail’ 1.76

Table 1: Example language assessment by Mechanical Turk(labMT) [27, 30] words and scores. Words with neutral scores4 < havg(wi)< 6 are colored gray and ignored when assigningthe happiness score to a large text.

References[1] Simini, F., Gonzalez, M. C., Maritan, A., Barabasi, A.-L. A universal

model for mobility and migration patterns. Nature. Vol. 484, No. 7392.(2012)

[2] Gonzalez, M. C., Hidalgo, C. A., Barabasi, A. L. Understanding indi-vidual human mobility patterns. Nature. Vol. 453, pp. 779-782, (2008)doi:10.1038/nature06958

[3] Eagle, N., Macy, M., Claxton, R. Network Diversity and Economic De-velopment. Science. Vol 328, No. 5981, pg. 1029-1031 (2010)

[4] Song, C., Qu, Z., Blumm, N., Barabsi, A.-L. Limits of Predictability inHuman Mobility. Science. Vol 327, No. 5968, pg. 1018-1021 (2010)

[5] de Montjoye, Y.-A., Hidalgo, C.A., Verleysen, M., Blondel, V.D. Uniquein the Crowd: The privacy bounds of human mobility. Nature ScientificReports. Vol 3, No. 1376. (2013)

[6] Wang, D., Pedreschi, D., Song, C., Gionatti, F., Barabasi, A.-L. Humanmobility, social ties, and link prediction. Proceedings of the 17th ACMSIGKDD international conference on Knowledge discovery and data min-ing, pp 1100-1108 (2011)

[7] Brockmann, D. D., Hufnagel, L. & Geisel, T. The scaling laws of humantravel. Nature 439, 462-465 (2006).

[8] Bagrow, J. P., Lin Y-R. Mesoscopic Structure and Social As-pects of Human Mobility. PLoS ONE 7(5): e37676. (2012)doi:10.1371/journal.pone.0037676

[9] Ramos-Fernandez, G. et al. Levy walk patterns in the foraging move-ments of spider monkeys (Ateles geoffroyi). Behav. Ecol. Sociobiol. 273,1743-1750 (2004)

[10] Palla, G., Barabasi, A.-L. & Vicsek, T. Quantifying social group evolu-tion. Nature 446, 664-667 (2007).

[11] Hidalgo, C. A. & Rodriguez-Sickert, C. The dynamics of a mobile phonenetwork. Physica A 387, 3017-3024 (2008).

[12] Barabasi, A.-L. The origin of bursts and heavy tails in human dynamics.Nature 435, 207-211 (2005).

[13] Schlich, R. & Axhausen, K.W. Habitual travel behaviour: Evidence froma six-week travel diary. Transportation 30, 13-36 (2003).

[14] Eagle, N. & Pentland, A. Eigenbehaviours: identifying structure in rou-tine. Behav. Ecol. Sociobiol. (in the press).

11

Page 12: Happiness and the Patterns of Life: A  Study of Geolocated Tweets

[15] Klafter, J., Shlesinger, M. F. & Zumofen, G. Beyond Brownian motion.Phys. Today 49, 33-39 (1996).

[16] Gonzalez, M. C., Lind, P. G. & Herrmann, H. J. A system of mobileagents to model social networks. Phys. Rev. Lett. 96, 088702 (2006).

[17] Horner, M. W. & O-Kelly, M. E. S Embedding economies of scale con-cepts for hub networks design. J. Transp. Geogr. 9, 255-265 (2001).

[18] Kitamura, R., Chen, C., Pendyala, R. M. & Narayaran, R. Micro-simulation of daily activity-travelpatterns for travel demand forecasting.Transportation 27,25-51 (2000).

[19] Colizza, V., Barrat, A., Barthelemy, M., Valleron, A.-J. & Vespignani,A. Modeling the worldwide spread of pandemic influenza: Baseline caseand containment interventions. PLoS Medicine 4, 95-110 (2007).

[20] Eubank, S. et al. Controlling epidemics in realistic urban social net-works. Nature 429, 180-184 (2004).

[21] Hufnagel, L., Brockmann, D. & Geisel, T. Forecast and control of epi-demics in a globalized world. Proc. Natl Acad. Sci. USA 101, 15124-15129 (2004).

[22] Hedstrom, P. Experimental Macro Sociology: Predictingthe Next Best Seller. Science 311 (5762), 786-787. (2006).[DOI:10.1126/science.1124707]

[23] Tumasjan, A., et al. Predicting elections with Twitter: What 140 char-acters reveal about political sentiment. Proceedings of the fourth interna-tional aaai conference on weblogs and social media. (2010).

[24] Kirilenko, A., et al. The flash crash: The impact of high frequency trad-ing on an electronic market. Available at SSRN 1686004 (2011).

[25] http://wheresgeorge.com

[26] Mitchell, L., Frank, M. R., Harris, K. D., Dodds, P. S., Danforth, C. M.The Geography of Happiness: Connecting Twitter sentiment and expres-sion, demographics, and objective characteristics of place. arXiv. (2013).http://arxiv.org/abs/1302.3299

[27] Dodds, P. S., Harris, K. D., Kloumann, I. M., Bliss, C. A., Danforth,C. M. Temporal Patterns of Happiness and Information in a Global-ScaleSocial Network: Hedonometrics and Twitter. PLoS ONE. 6(12): e26752.(2011). doi:10.1371/journal.pone.0026752

[28] http://www.uvm.edu/storylab/share/papers/frank2013a

[29] Zipf, G. Relative frequency as a determinant of phonetic change, Har-vard Studies in Classical Philology. (1929).

[30] Kloumann, I. M., Danforth, C. M., Harris, K. D., Bliss, C. A., Dodds, P.S. Positivity of the English Language. PLoS ONE 7(1): e29484. (2012).doi:10.1371/journal.pone.0029484

[31] Amazon’s Mechanical Turk service. Available athttps://www.mturk.com/. Accessed October 24, 2011.

[32] Rand, D. G. The promise of Mechanical Turk: How online labor marketscan help theorists run behavioral experiments. J Theor Biol. (2011).

[33] Dodds, P. S., Danforth, C. M. Measuring the Happiness of Large-ScaleWritten Expression: Songs, Blogs, and Presidents. Journal of HappinessStudies. (2009). dpi: 10.1007/s10902-009-9150-9

[34] Aaron S., Joanna B. Twitter Use 2012. Technical report, Pew ResearchInstitute, 2012.

[35] http://www.uvm.edu/storylab/share/papers/mitchell2013a

[36] Onnela, J.-P., Arbesman, S., Gonzlez, M. C., Barabsi, A.-L., Chris-takis,N. A. Geographic Constraints on Social Network Groups. PLoSONE 6(4): e16939. (2011) doi:10.1371/journal.pone.0016939

[37] Bliss, C. A., Kloumann, I. M., Harris, K. D., Danforth, C. M., Dodds, P.S. Twitter reciprocal reply networks exhibit assortativity with respect tohappiness. Journal of Computational Science 3(5), pp. 388-397, (2012).http://dx.doi.org/10.1016/j.jocs.2012.05.001

[38] U.S. Census Bureau Geography Division. 2010 CensusTIGER/Line Shapefiles. http://www.census.gov/geo/www/tiger/tgrshp2010/tgrshp2010.html, accessed February 2013.

AcknowledgementsWe would like to thank for their helpful comments and in-

put Brian Tivnan, James Bagrow, Yu-Ru Lin, Taylor Ricketts,Austin Troy, and Lisa Aultman-Hall. In addition, we wouldlike to thank MITRE Corporation, the UVM TransportationResearch Center, the Vermont Complex Systems Center, andthe Vermont Advance Computing Core for funding.Author Contributions

C.M.D. and P.S.D. designed the research, M.R.F. preparedthe figures, and C.M.D. and M.R.F. wrote the manuscript.M.R.F., L.M., P.S.D., and C.M.D analyzed the data and re-viewed the manuscript.Competing Financial Interests

The authors declare no competing financial interests.

12

Page 13: Happiness and the Patterns of Life: A  Study of Geolocated Tweets

Figure Captions

Figure 1: Each point corresponds to a geolocated tweet postedin 2011. Twitter activity is most apparent in urban areas.Note that the image contains no cartographic borders, simply asmall dot for each message. Legend: A (U.S.), B (Washington,D.C.), C (Los Angeles, C.A.), and D (Earth).

Figure 2: (Color online) The gyradius, calculated for each in-dividual, is shown for each tweet authored in four examplecities. Reflecting the pattern of urban life, we find messagesauthored by large radius individuals to be more likely to ap-pear in the main downtown area of each city, while messagesauthored by small radius individuals tend to appear outside ofthe main urban area. Histograms of gyradii for each city areshown in Fig. S1, along with tweet locations colored by dis-tance from expected location (Fig. S2).

Figure 3: (Color online) The probability density function ofobserving an individual in their normalized reference frame,where the origin corresponds to each individual’s expected lo-cation, and σy = 0 corresponds to their principle axis. Thismap shows the positions of over 37,000 individuals, each withmore than 50 locations, in their intrinsic reference frame.

Figure 4: Looking at messages authored in the principle axiscorridor, defined by | y

σy|< 30

1000 , we observe a clear separationbetween the most likely and second most likely position (A).The distribution is skewed left, with movement in a headingopposite an individual’s work/home corridor observed to behighly unlikely. In addition, due to the normalization, we seethat individuals are much more likely to tweet slightly east oftheir expected location than slightly west. The isotropy ratio(B) measures the change in the density’s shape as a functionof gyradius, with large radius individuals exhibiting a less cir-cular pattern of life. Standard errors are plotted, but are onlyvisible for the largest radius group. The isotropy ratio decayslogarithmically with radius.

Figure 5: Representing the approximately 300 individuals forwhom we have at least 800 geolocated messages, we plot theprobability of tweeting from a habitat as a function of thetweet habitat rank (A). Each dot represents a single individ-ual’s likelihood of tweeting from one of their habitats. Theaxes are logarithmic, revealing an approximate Zipfian dis-tribution with slope -1.3 [29]. (B) Distribution of the rank-1habitat, each individual’s mode location. (C) A robust diurnalcycle is observed in the hourly time of day at which statusesare updated, with those from the mode location (black curve)occurring more often than other locations (red curve) in themorning and evening. Probabilities sum to 1 for each curve,with bins for each hour. Dashed vertical lines denote midnight.

Figure 6: Tweets are grouped into ten equally populated binsby the distance from their author’s expected location, and theaverage happiness of words written at each distance is plotted(A). Expressed happiness grows logarithmically with distancefrom expected location. A similar trend is observed when in-dividuals are grouped into ten equally populated bins by theirgyradius, and all words authored by individuals in each bin aregathered (B). These observed trends persist through variationin binning and different measures of mobility.

Figure 7: (Color online) Word shift graphs comparing (A) thelowest average word happiness distance from home group tothe words authored farthest from home, which also has thelargest average word happiness and (B) the smallest gyradiusgroup with the largest gyradius group. The words in the wordshifts from top to bottom appear in decreasing order of rankedpercentage contribution to the overall average happiness dif-ference (∆havg) of the two texts being compared. The +/- sym-bols indicate whether the word has an average happiness scorethat is happy or sad relative to the entire text Tre f . The sym-bols ↑ / ↓ indicate whether a word was used more or less inTcomp relative to usage in Tre f . The left inset panel shows howthe ranked top contributing words to ∆havg combine in sum.The four circles in the lower right show the total contributionof the four word types (+ ↑,− ↑,+ ↓,− ↓). Relative text sizeis represented by the grey squares. See [27] for further detailsand examples of word shift graphs.

13

Page 14: Happiness and the Patterns of Life: A  Study of Geolocated Tweets

Supplementary Material:Happiness and the Patterns of Life: A Study of GeolocatedTweetsMorgan R. Frank, Lewis Mitchell, Peter S. Dodds,Christopher M. DanforthComputational Story Lab, Department of Mathematics and Statistics,Vermont Complex Systems Center, Vermont Advanced Computing Core,University of Vermont, Burlington, Vermont, United States of America

−2 −1 0 1 2 30

0.05

0.1

0.15

0.2

0.25

0.3

Log−10 User Radius of Gyration

Prob

abilit

y

New York City

−2 −1 0 1 2 30

0.05

0.1

0.15

0.2

0.25

0.3

Log−10 User Radius of GyrationPr

obab

ility

Los Angeles

−2 −1 0 1 2 30

0.05

0.1

0.15

0.2

0.25

0.3

Log−10 User Radius of Gyration

Prob

abilit

y

Chicago

−2 −1 0 1 2 30

0.05

0.1

0.15

0.2

0.25

0.3

Log−10 User Radius of Gyration

Prob

abilit

y

San Francisco Bay

Figure A7: The distributions of gyradius (km) for four cities appear to be log-normal. The mode distance (binned) islarger for Los Angeles and San Francisco than for Chicago and New York City. We note that these distributions werecalculated for all individuals whose expected location fell within the latitude and longitude bounds of Figure 2, andthus reflect a modified set of individuals than those identified with cities in Figure 3.

19

Figure S1: The distributions of gyradius (km) for four citiesappear to be log-normal. The mode distance (binned) is largerfor Los Angeles and San Francisco than for Chicago and NewYork City. We note that these distributions were calculated forall individuals whose expected location fell within the latitudeand longitude bounds of main text Fig. 2, and thus reflect amodified set of individuals than those identified with cities inFig. S3.

14

Page 15: Happiness and the Patterns of Life: A  Study of Geolocated Tweets

Figure S2: (Color online) The distance from expected location, calculated for each individual, is shown for each tweet authoredin four example cities in 2011. Messages authored far from this location are more likely to appear in the main downtown area ofeach city (tourists and long-distance commuters), while messages authored close to the expected location tend to appear outsideof the main urban area.

15

Page 16: Happiness and the Patterns of Life: A  Study of Geolocated Tweets

2.5 3 3.5 4 4.5 5 5.5 6 6.5−1

−0.5

0

0.5

1

1.5

2

2.5

Log−10 City Population

Log−

10 M

ean

Rad

ius

of G

yrat

ion

(km

)

ChicagoSan FranciscoLos AngelesNew York City

4 4.5 5 5.5 6 6.5 7−1

−0.5

0

0.5

1

1.5

2

2.5

Log−10 City Land Area (km)

Log−

10 M

ean

Rad

ius

of G

yrat

ion

(km

)

ChicagoSan FranciscoLos AngelesNew York City

A B

Figure 3: (Color online) The mean gyradius of individuals whose expected location falls within each city is plottedagainst the city’s population (A) and land area (B). Shown are cities containing at least 50 individuals with a nonzerogyradius, each individual having authored at least 30 geolocated tweets. City boundaries are defined by [30] whichencompasses a smaller area for the four cities illustrated in Figure 2. Generally, gyradius increases with city populationand land area, with no large cities exhibiting a small mean radius. Pearson correlations: Population r = 0.10, p = 0.03,Land Area r = 0.24, p = 2⇥10�7.

the Pew Internet & American Life Project, roughly 15%of adults in the U.S. were actively using Twitter at the endof the year during which we collected data [42]. Whilethis fraction represents a substantial group of Americans,we have no data to quantify the demographic group repre-sented by the subset of these 15% who specifically chooseto geotag a large percentage of their messages. Neverthe-less, since we threshold the sample to include individu-als who have geolocated more than approximately 300 oftheir messages in 2011, we suspect that the large majorityof individuals represented in our study regularly do so asa matter of daily life, as opposed to geolocating messagesonly when encountering a novel experience such as a va-cation.

Regarding word usage as a proxy for happiness, ac-cessing the internal emotional state of individuals is be-yond the scope of our instrument. We do believe however,that when aggregated, the words used by large groups ofindividuals reflect their culture in ways not captured bysurveys or self-report. Indeed, we see the hedonometeras complementing more traditional economic methods forcharacterizing economic and societal health, such as theGross Domestic Product or Consumer Confidence Index.Using the same collection of geolocated messages ex-plored here, the hedonometer was recently employed byMitchell et al. [17] to characterize trends in word usagefor cities. Expressed happiness was shown to correlate

to hundreds of demographic, socio-economic, and healthmeasures, with interactive evidence available in the arti-cle’s online Appendix1.

3 Results

In Figure 2, we investigate the geographical distributionof movement in four urban areas by plotting a dot foreach tweet, colored by the gyradius of its author. Clock-wise from the top left, cities are displayed in order of theirapparent aggregate gyradius, with New York City seem-ingly exhibiting a smaller radius than the San FranciscoBay Area. Reflecting the pattern of urban life, we findmessages authored by large radius individuals to be morelikely to appear in the main downtown area of each city,while messages authored by small radius individuals tendto appear in less densely populated areas. For example,in Chicago, many individuals writing from downtown ex-hibit an order of magnitude greater radius than individualsposting in areas outside of the city.

In the greater Los Angeles area, we see several clustersof individuals with larger radius in downtown Los Ange-les, as well Long Beach, Santa Monica, and Disneylandin Anaheim, while less densely populated areas are seen

1http://www.uvm.edu/storylab/share/papers/

mitchell2013a

6

Figure S3: (Color online) The mean gyradius of individu-als whose expected location falls within each city is plottedagainst the city’s population (A) and land area (B). Shown arecities containing at least 50 individuals with a nonzero gyra-dius, each individual having authored at least 30 geolocatedtweets. City boundaries are defined by [38] which encom-passes a smaller area for the four cities illustrated in the maintext Fig. 2. Generally, gyradius increases with city popula-tion and land area, with no large cities exhibiting a small meanradius. Pearson correlations: Population ρ = 0.10, p = 0.03,Land Area ρ = 0.24, p = 2×10−7.

In Fig. S3, we plot the mean gyradius vs land area for eachcity in the U.S. as defined by [38]. While there is considerablescatter among the cities, we do observe a weak correlation in-dicating that individuals living in more populated and largerareas travel farther. Table S1 shows the top and bottom citieswith respect to mean gyradius, as well as the four cities dis-cussed above.

rank radius (km) city1 200.6 Martinsville, VA2 124.5 Middletown, OH3 112.3 Elkhart, IN4 98.8 Pottstown, PA5 96.6 Decatur, IL· · · · · · · · ·215 13.3 New York City, NY247 11.4 Chicago, IL300 8.94 Los Angeles, CA387 4.33 San Francisco & Oakland, CA· · · · · · · · ·468 0.492 Greenville, MS469 0.491 Athens, OH470 0.465 Key West, FL471 0.381 El Centro Calexico, CA472 0.312 Pullman, WA

Table S1: Top and Bottom 5 cities with respect to mean gy-radius, along with the four cities investigated in main text Fig2.

16

Page 17: Happiness and the Patterns of Life: A  Study of Geolocated Tweets

New York City Los Angeles

1 10 1005.85

5.9

5.95

6

6.05

Distance (km) from Expected Location

Aver

age

Wor

d H

appi

ness

1 10 1005.85

5.9

5.95

6

6.05

Distance (km) from Expected Location

Aver

age

Wor

d H

appi

ness

Chicago San Francisco Bay Area

1 10 1005.85

5.9

5.95

6

6.05

Distance (km) from Expected Location

Aver

age

Wor

d H

appi

ness

1 10 1005.85

5.9

5.95

6

6.05

Distance (km) from Expected Location

Aver

age

Wor

d H

appi

ness

Figure A8: For New York City, Los Angeles, Chicago, and the San Francisco Bay Area, we group messages intoequally sized bins by the distance from expected location of their author, and measure the average word happiness ofeach group. These plots exhibit similar trends to that observed in Figure 8A with the exception of New York City.

20

Figure S4: For New York City, Los Angeles, Chicago, and the San Francisco Bay Area, we group messages into equally sizedbins by the distance from expected location of their author, and measure the average word happiness of each group. These plotsexhibit similar trends to that observed in main text Fig. 6A with the exception of New York City.

17

Page 18: Happiness and the Patterns of Life: A  Study of Geolocated Tweets

New York City Los Angeles

1 10 1005.85

5.9

5.95

6

6.05

User Radius of Gyration (km)

Aver

age

Wor

d H

appi

ness

1 10 1005.85

5.9

5.95

6

6.05

User Radius of Gyration (km)

Aver

age

Wor

d H

appi

ness

Chicago San Francisco Bay Area

1 10 1005.85

5.9

5.95

6

6.05

User Radius of Gyration (km)

Aver

age

Wor

d H

appi

ness

1 10 1005.85

5.9

5.95

6

6.05

User Radius of Gyration (km)

Aver

age

Wor

d H

appi

ness

Figure A9: For New York City (130 individuals/bin), Los Angeles (175 individuals/bin), Chicago (125 individu-als/bin), and the San Francisco Bay Area (63 individuals/bin), we group individuals into equally sized bins by theirgyradius and measure the average word happiness of each group. These plots exhibit similar trends to that observedin Figure 8B with the exception of the largest radius group in the San Francisco Bay Area, and New York City as awhole.

21

Figure S5: For New York City (130 individuals/bin), Los Angeles (175 individuals/bin), Chicago (125 individuals/bin), andthe San Francisco Bay Area (63 individuals/bin), we group individuals into equally sized bins by their gyradius and measurethe average word happiness of each group. These plots exhibit similar trends to that observed in main text Fig. 6B with theexception of the largest radius group in the San Francisco Bay Area, and New York City as a whole.

18

Page 19: Happiness and the Patterns of Life: A  Study of Geolocated Tweets

−20 −10 0 10 20

1

5

10

15

20

25

30

35

40

45

50

shit −?car +B

+?lol−Bno

me +B+?love

ass −?hate −?

bitch −?nigga −?

+?likedont −?

+?hahadamn −?mad −?never −?

−Bconniggas −?hell −?great +Btired −?gone −?bitches −?google +Bbored −?hurt −?cant −?sick −?

−Blastweekend +B

+?sleepstupid −?ill −?home +Bawesome +B

+?my+?girl

+?funnyugly −?cry −?lie −?

+?babyfuckin −?mean −?fun +Bdinner +B

+?youlunch +Bbad −?hungry −?

Per word average happiness shift bhavg, r

(%)

Wor

d ra

nk r

Tref: User Radius of Gyration 6.55 (km) (havg=6.01)Tcomp: User Radius of Gyration 292.03 (km) (havg=6.06)

Text size:T

ref T

comp

+? +B

−B −?

Balance:

−125 : +225

0 100

100

101

102

103

104

Yi=1r bh

avg, i

−15 −10 −5 0 5 10 15

1

5

10

15

20

25

30

35

40

45

50

car +Bshit −?

me +Bdie −?

−Bnotraffic −?

+?haha+?love

+?lolhate −?

ass −?bitch −?

dont −?+?like−Bcon

rent −?accident −?not −?nigga −?google +Bmad −?never −?damn −?hell −?bad −?home +B

−Bmiss+?good

tired −?+?bed+?you

awesome +Bdinner +Blunch +Bill −?bitches −?bored −?delay −?zit −?

−Blastsorry −?great +Bsick −?cant −?fuckin −?

+?sleepniggas −?cry −?stop −?stupid −?

Per word average happiness shift bhavg, r

(%)

Wor

d ra

nk r

Tref: User Radius of Gyration 13.26 (km) (havg=6.02)Tcomp: User Radius of Gyration 292.03 (km) (havg=6.06)

Text size:T

ref T

comp

+? +B

−B −?

Balance:

−119 : +219

0 100

100

101

102

103

104

Yi=1r bh

avg, i

Figure A10: (Color online) We compare the 6.55 km gyradius group versus the 292.03 km gyradius group (Left). Wefind that the 292.03 km group has relatively frequent use of the words ‘car’ and ‘weekend’ suggesting that this grouptravels on the weekends perhaps to a vacation home as suggested by use of the word ‘home’. (Right) We comparethe 13.26 km gyradius group versus the 292.03 km gyradius group. We find that the 292.03 km group uses the word‘car’ more frequently than the 13.26 km group which, interestingly, uses the word ‘traffic’ more frequently. Again theincreased relative usage of these words seems fitting for a groups with these patterns of movement.

22

Figure S6: (Color online) We compare the 6.55 km gyradius group versus the 292.03 km gyradius group (Left). We find thatthe 292.03 km group has relatively frequent use of the words ‘car’ and ‘weekend’ suggesting that this group travels on theweekends perhaps to a vacation home as suggested by use of the word ‘home’. (Right) We compare the 13.26 km gyradiusgroup versus the 292.03 km gyradius group. We find that the 292.03 km group uses the word ‘car’ more frequently than the13.26 km group which, interestingly, uses the word ‘traffic’ more frequently. Again the increased relative usage of these wordsseems fitting for a groups with these patterns of movement.

19

Page 20: Happiness and the Patterns of Life: A  Study of Geolocated Tweets

−10 −5 0 5 10

1

5

10

15

20

25

30

35

40

45

50

shit −?ass −?

+?lolhell −?

nigga −?−Blast

fun +Bcant −?great +Bgone −?

+?mewait −?

−Bsick+?life

wrong −?−Bdie

not −?ill −?

−Bdown+?like+?my

+?sleepweak −?

+?cuteweekend +Bdeath −?won +Bsong +Bgame +Bthanks +Bmad −?coffee +B

+?moneyworst −?

+?sexshooting −?fuckin −?

−Bkillingnew +B

−Bfightgood +Blunch +B

+?girlbitch −?win +Bbest +Bwar −?problem −?evil −?idiot −?

Per word average happiness shift bhavg, r (%)

Wor

d ra

nk r

Tref: User Radius of Gyration 4.79 (km) (havg=5.87)Tcomp: User Radius of Gyration 223.64 (km) (havg=6.06)

Text size:Tref Tcomp

+? +B

−B −?

Balance:

−152 : +252

0 100

100

101

102

103

104

Yi=1r bhavg, i

−30 −20 −10 0 10 20 30

1

5

10

15

20

25

30

35

40

45

50

+?meshit −?

no −?problem −?

−Bbitch−Blost

haha +B+?sleep

scared −?−Blast

win +B−Bill

−Bsorryugh −?hate −?

−Bdont−Bkill

damn −?bored −?nothing −?weekend +Bcheated −?

+?hopefunny +Blol +Bcough −?pain −?

+?startingfaggot −?ugly −?fell −?bitches −?dead −?

−Bpoorinjury −?

−Bbomb−Bmean−Bmad

+?superfamily +B

+?playingfuckin −?

−Bblockedlies −?like +Bsurgery −?shame −?cool +B

+?girlsbad −?

Per word average happiness shift bhavg, r (%)

Wor

d ra

nk r

Tref: User Radius of Gyration 0.87 (km) (havg=5.98)Tcomp: User Radius of Gyration 123.54 (km) (havg=6.07)

Text size:Tref Tcomp

+? +B

−B −?

Balance:

−430 : +530

0 100

100

101

102

103

104

Yi=1r bhavg, i

Figure A12: (Color online) (Left) A word shift comparing the 4.79 km gyradius group to the 223 km gyradius forChicago. We observe the first group is less happy because of increased usage of profanity and negative words like‘can’t’, ‘gone’, and ’wrong’. (Right) A word shift comparing the .87 km gyradius group to the 123.54 km gyradiusgroup for the San Francisco Bay Area. We find the second group to be happier because of an increase in positivewords like ‘haha’, ‘win’, ‘weekend’, ‘funny’, and ‘lol’, along with a decrease in negative words like ‘no’, ‘problem’,and ‘hate’.

24

Figure S7: (Color online) (Left) A word shift comparing the 4.79 km gyradius group to the 223 km gyradius for Chicago.We observe the first group is less happy because of increased usage of profanity and negative words like ‘can’t’, ‘gone’, and’wrong’. (Right) A word shift comparing the .87 km gyradius group to the 123.54 km gyradius group for the San Francisco BayArea. We find the second group to be happier because of an increase in positive words like ‘haha’, ‘win’, ‘weekend’, ‘funny’,and ‘lol’, along with a decrease in negative words like ‘no’, ‘problem’, and ‘hate’.

20

Page 21: Happiness and the Patterns of Life: A  Study of Geolocated Tweets

10−3

10−1

101

103

10−4

10−2

100

Radius of Gyration (km)

CC

DF

Figure S8: Complementary Cumulative Distribution Function (CCDF) for the gyradii of all users with at least 30 geotaggedmessages. Gonzalez [2] found this distribution to be well modeled by a truncated power law, exponential tail.

21

Page 22: Happiness and the Patterns of Life: A  Study of Geolocated Tweets

Normalizing Human TrajectoryTo compare the shape of trajectories of individuals traveling

in different directions and over different distances, we use themethods introduced by Gonzalez et al. [2]. We will examinethe normalization steps for two individuals we will call userA and user B. We have 768 geolocated tweets for user Aand 1,882 geolocated tweets for user B. User A has gyradiusrA = 463.61 km and user B has gyradius rB = 54.28 km. Fig.S9 represents the geospatial tweet locations for user A anduser B, but we shifted their coordinate system to maintaintheir anonymity. We have also allowed for a slight spatialseparation between the locations for user A and the locationsof user B for clarity.

−8 −6 −4 −2 0 2 4 6 8 10−8

−6

−4

−2

0

2

4

Longitude Degrees

Latit

ude

Deg

rees

User AUser B

2.54 2.56 2.58 2.6 2.62 2.64

2.035

2.04

2.045

2.05

2.055

2.06

2.065

2.07aaaaaa

���

���

⌦⌦⌦⌦⌦⌦⌦⌦⌦

BBBBBBBBB

6 7 8 9 10−0.5

0

0.5

1

1.5

2

Figure 8: Tweet locations for User A and User B.

−16 −14 −12 −10 −8 −6 −4 −2 0 2−2

−1

0

1

2

3

4

Longitudinal Distance (km) from Expected Location

Latit

udin

al D

ista

nce

(km

) fro

m E

xpec

ted

Loca

tion

User AUser B

Figure 11: The results after rotating the locations of UserA and User B. We see that they now both have principalaxes of trajectory pointing due west.

−3.5 −3 −2.5 −2 −1.5 −1 −0.5 0 0.5−6

−4

−2

0

2

4

6

8

x/σx

y/σ

y

User AUser B

Figure 12: The rotated tweet locations of User A and UserB after normalizing for radius of gyration. The origin rep-resents the center of mass of the respective individuals’trajectory, namely < ~pa > from equation (2).

15

Figure S9: (Color online) Tweet locations for User A and UserB.

In Fig. S10, we apply the linear transformation shiftingeach location for the user to the distance in kilometers fromtheir center of mass, i.e. the expected location of the user.The difference in gyradius between user A and user B is stillvery apparent in the axis ranges for this plot. Notice that thedirectional relationships between the tweet locations for eachuser have still been preserved. We can see that user A travelspredominantly in a southwest direction, while user B travelsprimarily in a northwest direction.

To normalize for direction of travel, let the set oftweet locations for user i be represented by the set ofequally weighted masses at each of the tweet locations{(x1,y1),(x2,y2), . . . ,(xn,yn)}. Now we calculate the tensorof inertia (I) for each set of weighted (x,y)-points as

I =

n

∑j=1

x2j −

n

∑j=1

x jy j

−n

∑j=1

x jy j

n

∑j=1

y2j

The eigenvector of I corresponding to the largest eigenvalue ofI represents the direction along which most of user i’s trajec-tory occurs; we call this the principal axis for user i (see Fig.S11).

−15 −10 −5 0 5−8

−7

−6

−5

−4

−3

−2

−1

0

1

Longitudinal Distance (km) from Expected Location

Latit

udin

al D

ista

nce

(km

) fr

om E

xpec

ted

Loca

tion User A

−1 −0.5 0 0.5−0.2

0

0.2

0.4

0.6

0.8

1

1.2

Longitudinal Distance (km) from Expected Location

Latit

udin

al D

ista

nce

(km

) fr

om E

xpec

ted

Loca

tion User B

Figure S10: (Color online) Tweet locations for User A andUser B transformed to the distance in kilometers from theirexpected locations, respectively.

−16 −14 −12 −10 −8 −6 −4 −2 0 2 4−10

−5

0

5

Longitudinal Distance (km) from Expected Location

Latit

udin

al D

ista

nce

(km

) fr

om E

xpec

ted

Loca

tion

User AUser BPrincipal Axis APrincipal Axis B

Figure S11: (Color online) The tweet locations for User A andUser B along with a line representing the principal axis for thatuser.

22

Page 23: Happiness and the Patterns of Life: A  Study of Geolocated Tweets

Now we can determine the angle necessary to rotate the setof points for user i so that the the resulting principal axis is thex-axis. Fig. S12 shows the results of this step. We see that theprincipal axis for user A and user B is now the x-axis.

−16 −14 −12 −10 −8 −6 −4 −2 0 2−2

−1

0

1

2

3

4

Longitudinal Distance (km) from Expected Location

Latit

udin

al D

ista

nce

(km

) fr

om E

xpec

ted

Loca

tion

User AUser B

Figure S12: (Color online) The results after rotating the loca-tions of User A and User B. We see that they now both haveprincipal axes of trajectory pointing due west.

The final step is to normalize for individuals with differ-ent gyradius. We accomplish this by dividing the x-coordinateof each rotated tweet location for user i by σx, where σx isthe standard deviation of the x-coordinates of the rotated tweetlocations for user i, and similarly dividing by σy for the y-coordinates. The final result is shown in Fig. S13.

−3.5 −3 −2.5 −2 −1.5 −1 −0.5 0 0.5−6

−4

−2

0

2

4

6

8

x/σx

y/σ y

User AUser B

Figure S13: (Color online) The rotated tweet locations of UserA and User B after normalizing for gyradius. The origin rep-resents the center of mass of the respective individuals’ trajec-tory, namely 〈~pa〉 from equation (2).

As a result, we can compare the shape of the trajectoriesfor User A and User B having normalized for direction andgyradius. We can see that both User A and User B have most

of their normalized tweet locations in two main clusters.

23