uncertainty of identity: classifying twitter data
DESCRIPTION
This presentation proposes the methods of classifying Twitter Data. There has been a tremendous rise in the growth of online social networks all over the world in recent times. Here we present the analysis performed on the Twitter data to identify the aspects of cultural and ethnic identity.TRANSCRIPT
Uncertainty of Identity: Classifying Twitter Data
Muhammad Adnan (and Prof. Paul Longley)
University College London
Uncertainty of Identity: Project Aims• A combined project between UCL, City University, and
University of Birmingham
• Combining real and virtual world datasets to better understand the identity of individuals• Real world datasets (Surname data, socio-economic datasets)• Virtual world datasets (Email addresses, Social media accounts)
My research interests
• Data mining• Analysis of Twitter data • Visualisation of the data
Twitter (www.twitter.com)
• Online social-networking and micro blogging service
• Was launched in 2006. After 6 years, Twitter has 500 million active users.
• Generates 350 million tweets daily
• One of the top 10 most visited websites on the internet
• Twitter API can be used to download live tweets
Twitter API’s data
• User Creation Date• Followers• Friends• User ID• Language• Location• Name• Screen Name• Time Zone
• Geo Enabled• Latitude• Longitude• Tweet date and time• Tweet text
Classifying Twitter Data to ethnic origins
• User Creation Date• Followers• Friends• User ID• Language• Location• Name• Screen Name• Time Zone
• Geo Enabled• Latitude• Longitude• Tweet date and time• Tweet text
Classifying Twitter Data to ethnic origins
• Some examples of NAME variations on Twitter
Real Names
Kevin Hodge
Andre Alves
Jose de Franco
Carolina Thomas, Dr.
Prof. Martha Del Val
Fabíola Sanchez Fernandes
Fake Names
Castor 5.
WHAT IS LOVE?
MysticMind
KIRILL_aka_KID
Vanessa
Petuna
Top Twitter Users
Where they tweet from:
Surname: JONES
Where they tweet from:
Surname: DEE
Where they tweet from:
Surname: SHAH
Classifying Twitter Data to ethnic origins• Applied ONOMAP (www.onomap.org) on FORENAME +
SURNAME pairs
Kevin Hodge (ENGLISH)
Andre de Franco (ITALIAN)
…
…
…
…
English Scottish Welsh Italian
Pakistani Chinese
Spanish
Indian Polish
German French Portuguese
Bangladeshi
African
Irish
Twitter Ethnicity Maps
English Scottish Welsh Italian
Pakistani Chinese
Spanish
Indian Polish
German French Portuguese
Bangladeshi
African
Irish
Twitter Ethnicity Maps
SpanishGerman
Twitter Ethnicity Maps
French African
Twitter Ethnicity Maps
English Italian
Pakistani Indian
TurkishGreek
Bangladeshi
Spanish
German French
Portuguese
Sikh
Twitter Ethnicity Maps
Chinese Polish Jewish
SwedishNigerian Somalian Ghanian
Sri Lankan
Danish
Twitter Ethnicity Maps
Chinese Polish Jewish
SwedishSomalian Ghanian
Twitter Ethnicity Maps
http://www.guardian.co.uk/news/datablog/
London
Which places they are talking about ?• Tweets containing ‘London’ in their text string• Applying text matching algorithms to remove tweets contain places
which are not London e.g. London Road or London, Ontaio
New York
Which places they are talking about ?
Madrid
Which places they are talking about ?
Twitter Language Maps
Twitter Language Maps
Twitter Language Maps
Conclusion
• Use of social media is increasing day by day
• Social-media datasets can give an insight into people’s behaviour in virtual worlds
• Investigation of ethnicity origins in other countries to establish inferences on migration trends in developed and developing countries
• Future work will involve the investigation of Four Square and Facebook data
Any Questions ?
Thank you for Listening
Web: http://www.uncertaintyofidentity.com
Email: [email protected]
Twitter: @gisandtech