spatio-temporal demographic classification of the twitter users

23
Spatio-temporal demographic classification of the Twitter users Paul Longley, Muhammad Adnan, Guy Lansley Department of Geography, University College London Web: http://www.uncertaintyofidentity.com

Upload: muhammad-adnan

Post on 06-Jul-2015

110 views

Category:

Social Media


0 download

DESCRIPTION

Use of social media continues to increase day by day, with implications for the creation of ‘big’ data – Twitter alone was forecast to have created 1.8 zettabytes of data in 2011. This talk presents an initial work towards the creation of geo-temporal geodemgoraphic classifications by using the Twitter social media data. London was chosen as the study area because of its high incidence of users and the consequent expectation that higher penetration might be associated with lower demographic bias.

TRANSCRIPT

Page 1: Spatio-temporal demographic classification of the Twitter users

Spatio-temporal demographic classification of the Twitter users

Paul Longley, Muhammad Adnan, Guy LansleyDepartment of Geography, University College London

Web: http://www.uncertaintyofidentity.com

Page 2: Spatio-temporal demographic classification of the Twitter users

Outline

1. Introduction• Geodemographics • Social Media Geodemographics

2. Twitter

3. A geo-temporal demographic classification of Twitter users• Residence of Twitter users• Ethnic classification of Twitter users

• Age classification of Twitter users• Computing the demographic classification

Page 3: Spatio-temporal demographic classification of the Twitter users

Introduction

• Geodemographics• Analysis of people by where they live” [1] • Night time characteristics of the population

• Social Media Geodemographics • Moving beyond the night time geography

• Who: Ethnicity, Gender, and Age of social media users

• When: What time of day conversations happen

• Where: Where social media conversations happen

[1] Sleight, P. (2004). Targetting Customers-How to Use Geodemographic and Lifestyle Data in Your Business.

Page 4: Spatio-temporal demographic classification of the Twitter users

Twitter (www.twitter.com)

• Online social-networking and micro blogging service• Launched in 2006

• Users can send messages of 140 characters or less

• Approximately 200 million active users [2]

• 350 million tweets daily

• In 2013, UK and London were ranked 4th and 3rd, respectively, in terms of the number of posted tweets [3]

[2] Twitter. 2012. What is Twitter ?. Retrieved 31st December, 2012, from https://business.twitter.com/basics/what-is-twitter/.

[3] Bennet, S. 2013. Revealed: The Top 20 Countries and Cities of Twitter [STATS]. Retrieved 31st December, 2013, from http://www.mediabistro.com/alltwitter/twitter-top-countries_b26726.

Page 5: Spatio-temporal demographic classification of the Twitter users

Data available through the Twitter API

• User Creation Date

• Followers

• Friends

• User ID• Language• Location• Name

• Screen Name

• Time Zone

• Geo Enabled• Latitude• Longitude

• Tweet date and time

• Tweet text

Page 6: Spatio-temporal demographic classification of the Twitter users

Twitter data for the case study

• Approx. 8 million geo-tagged tweets (Jan – Dec, 2013)• Sent by 385,050 unique users

• 155,249 users sent 5 or more tweets (7.6 million tweets)

Page 7: Spatio-temporal demographic classification of the Twitter users

Variables for creating a geo-temporal classification

1. Residence• Where twitter users live

1. Ethnicity• Probable ethnic origins of Twitter users

1. Age• Probable Age of Twitter users

1. Land Use Category of a Tweet message• Residential; Non-domestic building; Park etc.

2. Temporal Scales• Day, Afternoon, Night, Peak travel hours

Page 8: Spatio-temporal demographic classification of the Twitter users

Residence of Twitter Users

• 170m X 170m grid was used to find the probable residence of users

• Probable residence was found for the 75,522 users

Page 9: Spatio-temporal demographic classification of the Twitter users

Extracting demographic attributes of Twitter users by using their forenames and surnames

A name is a statement of the bearer’s cultural, ethnic, and linguistic identity [4]

[4] Mateos P, Longley P A, O’Sullivan D 2011. Ethnicity and population structure in personal naming networks. PloS ONE (Public Library of Science) 6 (9) e22943.

Page 10: Spatio-temporal demographic classification of the Twitter users

Analysing Names on Twitter

• Some examples of NAME variations on Twitter

• Approx. 68% of the accounts have real names

Fake Names

Castor 5.

WHAT IS LOVE?

MysticMind

KIRILL_aka_KID

Vanessa

Justin Bieber Home

Real Names

Kevin Hodge

Andre Alves

Jose de Franco

Carolina Thomas, Dr.

Prof. Martha Del Val

Fabíola Sanchez Fernandes

Page 11: Spatio-temporal demographic classification of the Twitter users

Onomap: Names to Ethnicity classification

• Onomap was created by clustering names of 1 billion individuals around the world

• Applied ONOMAP (www.onomap.org) on forename – surname pairs

Kevin Hodge (English)

Pablo Mateos (Spanish)

Page 12: Spatio-temporal demographic classification of the Twitter users

• Monica dataset provided by CACI Ltd, UK• Supplemented with UK birth certificate records

Age estimation from ‘forenames’

[5] Longley, P., Adnan, M., Lansley, G. 2013. “The geo-temporal demographics of Twitter usage”. Environment and Planning A. (In Press)

Page 13: Spatio-temporal demographic classification of the Twitter users

Age distribution of Twitter users

Twitter Users vs. 2011 Census (Greater London)

[5] Longley, P., Adnan, M., Lansley, G. 2013. “The geo-temporal demographics of Twitter usage”. Environment and Planning A. (In Press)

Page 14: Spatio-temporal demographic classification of the Twitter users

Land-use Categories• Every tweet message was assigned a land-use category

Page 15: Spatio-temporal demographic classification of the Twitter users

Variables for creating a geo-temporal classification1. ResidenceV1: Tweet made near probable London residence

V2: Tweeter lives ‘outside the UK’

V3: Tweeter lives in the rest of the UK outside London

2. Total Number of TweetsV4: Total number of tweets made by the user

3. EthnicityV5: West European

V6: East European

V7: Greek or Turkish

V8: South East Asian

V9: Other Asian

V10: African & Caribbean

V11: Jewish

V12: Chinese

V13: Other minority

4. AgeV14: <=20

V15: 21 - 30

V16: 31 - 40

V17: 41 - 50

V18: 50+

5. Tweets outside the UKV19: In West Europe (not including UK)

V20: In East Europe

V21: In North America

V22: In Central or South American

V23: In Australasia

V24: In Africa

V25: In Middle East

V26: In Asia

V27: In Paris

Page 16: Spatio-temporal demographic classification of the Twitter users

Variables for creating a geo-temporal classification

6. Number of countries visitedV28: Number of countries tweeter has visited

7. London Land Use CategoryV29: Residential location

V30: Non-domestic buildings

V31: Transport links and locations

V32: Green-spaces

V33: All other land uses

8. 2011 London Output Area ClassificationV34: Intermediate Lifestyles

V35: High Density and High Rise Flats

V36: Settled Asians

V37: Urban Elites

V38: City Vibe

V39: London Life-Cycle

V40: Multi-Ethnic Suburbs

V41: Ageing-City Fringe

9. Temporal ScalesV42: Morning Peak Hours

V43: Week Day

V44: Afternoon

V45: Week Night

V46: Weekend

Page 17: Spatio-temporal demographic classification of the Twitter users

• Segmentations were created by using K-means clustering algorithm

• K-means tries to find cluster centroids by minimising

• Seven clusters

• Group A: London Residents

• Group B: Commuting Professionals

• Group C: Student Lifestyle

• Group D: The Daily Grind

• Group E: Spectators

• Group F: Visitors

• Group G: Workplace and tourist activity

Computing the geo-temporal classifications

∑∑ −= =

=n

x

n

yyxV z

1 1

2

)( µ

Page 18: Spatio-temporal demographic classification of the Twitter users

Group A: London Residents

• Tweets made near primary residential locations

• Tweets made on weeknights or weekends

Page 19: Spatio-temporal demographic classification of the Twitter users

Group B: Commuting Professionals

• Tweets made from• Transport locations• ‘Urban Elites’ LOAC classification

• Tweets made by individuals of intermediate age (21-30)

Page 20: Spatio-temporal demographic classification of the Twitter users

Group F: Visitors

• Tweeters live outside London

• Tweets originated from residential land uses

• Mixed age groups

Page 21: Spatio-temporal demographic classification of the Twitter users

Group G: Workplace and tourist activity

• Tweets sent from non-domestic buildings

• Full range of Twitter age cohorts

• Tweets originate from a mix of residents and international visitors

Page 22: Spatio-temporal demographic classification of the Twitter users

Conclusion

• Geo-temporal demographic classifications• Census (night time geography)

• Social media data (day and travel time geography)• Issues of representation

• An insight into the residential and travel geographies of individuals

• An insight into the spatial activity patterns of different kind of social media users

Page 23: Spatio-temporal demographic classification of the Twitter users

Any Questions ?

Thank you for Listening