uncertainty of identity: classifying twitter data

29
Uncertainty of Identity: Classifying Twitter Data Muhammad Adnan (and Prof. Paul Longley) University College London

Upload: muhammad-adnan

Post on 12-May-2015

238 views

Category:

Education


0 download

DESCRIPTION

This presentation proposes the methods of classifying Twitter Data. There has been a tremendous rise in the growth of online social networks all over the world in recent times. Here we present the analysis performed on the Twitter data to identify the aspects of cultural and ethnic identity.

TRANSCRIPT

Page 1: Uncertainty of Identity: Classifying Twitter Data

Uncertainty of Identity: Classifying Twitter Data

Muhammad Adnan (and Prof. Paul Longley)

University College London

Page 2: Uncertainty of Identity: Classifying Twitter Data

Uncertainty of Identity: Project Aims• A combined project between UCL, City University, and

University of Birmingham

• Combining real and virtual world datasets to better understand the identity of individuals• Real world datasets (Surname data, socio-economic datasets)• Virtual world datasets (Email addresses, Social media accounts)

My research interests

• Data mining• Analysis of Twitter data • Visualisation of the data

Page 3: Uncertainty of Identity: Classifying Twitter Data

Twitter (www.twitter.com)

• Online social-networking and micro blogging service

• Was launched in 2006. After 6 years, Twitter has 500 million active users.

• Generates 350 million tweets daily

• One of the top 10 most visited websites on the internet

• Twitter API can be used to download live tweets

Page 4: Uncertainty of Identity: Classifying Twitter Data

Twitter API’s data

• User Creation Date• Followers• Friends• User ID• Language• Location• Name• Screen Name• Time Zone

• Geo Enabled• Latitude• Longitude• Tweet date and time• Tweet text

Page 5: Uncertainty of Identity: Classifying Twitter Data
Page 6: Uncertainty of Identity: Classifying Twitter Data
Page 7: Uncertainty of Identity: Classifying Twitter Data
Page 8: Uncertainty of Identity: Classifying Twitter Data

Classifying Twitter Data to ethnic origins

• User Creation Date• Followers• Friends• User ID• Language• Location• Name• Screen Name• Time Zone

• Geo Enabled• Latitude• Longitude• Tweet date and time• Tweet text

Page 9: Uncertainty of Identity: Classifying Twitter Data

Classifying Twitter Data to ethnic origins

• Some examples of NAME variations on Twitter

Real Names

Kevin Hodge

Andre Alves

Jose de Franco

Carolina Thomas, Dr.

Prof. Martha Del Val

Fabíola Sanchez Fernandes

Fake Names

Castor 5.

WHAT IS LOVE?

MysticMind

KIRILL_aka_KID

Vanessa

Petuna

Page 10: Uncertainty of Identity: Classifying Twitter Data

Top Twitter Users

Page 11: Uncertainty of Identity: Classifying Twitter Data

Where they tweet from:

Surname: JONES

Page 12: Uncertainty of Identity: Classifying Twitter Data

Where they tweet from:

Surname: DEE

Page 13: Uncertainty of Identity: Classifying Twitter Data

Where they tweet from:

Surname: SHAH

Page 14: Uncertainty of Identity: Classifying Twitter Data

Classifying Twitter Data to ethnic origins• Applied ONOMAP (www.onomap.org) on FORENAME +

SURNAME pairs

Kevin Hodge (ENGLISH)

Andre de Franco (ITALIAN)

Page 15: Uncertainty of Identity: Classifying Twitter Data

English Scottish Welsh Italian

Pakistani Chinese

Spanish

Indian Polish

German French Portuguese

Bangladeshi

African

Irish

Twitter Ethnicity Maps

Page 16: Uncertainty of Identity: Classifying Twitter Data

English Scottish Welsh Italian

Pakistani Chinese

Spanish

Indian Polish

German French Portuguese

Bangladeshi

African

Irish

Twitter Ethnicity Maps

Page 17: Uncertainty of Identity: Classifying Twitter Data

SpanishGerman

Twitter Ethnicity Maps

Page 18: Uncertainty of Identity: Classifying Twitter Data

French African

Twitter Ethnicity Maps

Page 19: Uncertainty of Identity: Classifying Twitter Data

English Italian

Pakistani Indian

TurkishGreek

Bangladeshi

Spanish

German French

Portuguese

Sikh

Twitter Ethnicity Maps

Page 20: Uncertainty of Identity: Classifying Twitter Data

Chinese Polish Jewish

SwedishNigerian Somalian Ghanian

Sri Lankan

Danish

Twitter Ethnicity Maps

Page 21: Uncertainty of Identity: Classifying Twitter Data

Chinese Polish Jewish

SwedishSomalian Ghanian

Twitter Ethnicity Maps

http://www.guardian.co.uk/news/datablog/

Page 22: Uncertainty of Identity: Classifying Twitter Data

London

Which places they are talking about ?• Tweets containing ‘London’ in their text string• Applying text matching algorithms to remove tweets contain places

which are not London e.g. London Road or London, Ontaio

Page 23: Uncertainty of Identity: Classifying Twitter Data

New York

Which places they are talking about ?

Page 24: Uncertainty of Identity: Classifying Twitter Data

Madrid

Which places they are talking about ?

Page 25: Uncertainty of Identity: Classifying Twitter Data

Twitter Language Maps

Page 26: Uncertainty of Identity: Classifying Twitter Data

Twitter Language Maps

Page 27: Uncertainty of Identity: Classifying Twitter Data

Twitter Language Maps

Page 28: Uncertainty of Identity: Classifying Twitter Data

Conclusion

• Use of social media is increasing day by day

• Social-media datasets can give an insight into people’s behaviour in virtual worlds

• Investigation of ethnicity origins in other countries to establish inferences on migration trends in developed and developing countries

• Future work will involve the investigation of Four Square and Facebook data

Page 29: Uncertainty of Identity: Classifying Twitter Data

Any Questions ?

Thank you for Listening

Web: http://www.uncertaintyofidentity.com

Email: [email protected]

Twitter: @gisandtech