phd colloquium spatial analysis
DESCRIPTION
Presentation given as part of a PHD Colloquium on Spatial Analysis delivered on Wed 11th January 2013TRANSCRIPT
![Page 1: Phd Colloquium Spatial Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022061206/54831f39b4af9ffb6c8b4584/html5/thumbnails/1.jpg)
Data Mining to Understand International Dimensions to Online Identity - a classification of 2+ billion names and their linkage to virtual identities and social network traffic.
• Alistair Leak• UCL SECReT• [email protected]
![Page 2: Phd Colloquium Spatial Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022061206/54831f39b4af9ffb6c8b4584/html5/thumbnails/2.jpg)
Who am I?
Education:Kingston University (BSc) - GIS
UCL (M.Res) - Advanced Spatial Analysis and Visualisation
UCL 3+1 - PhD Security and Crime Science
Supervisors:1st Supervisor: Professor Paul Longley
2nd Supervisor: Dr James Cheshire
![Page 3: Phd Colloquium Spatial Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022061206/54831f39b4af9ffb6c8b4584/html5/thumbnails/3.jpg)
Definitions:• Netnography
– “A qualitative, interpretive research methodology that uses internet-optimized ethnographic research techniques to study the social context in online communities” (Kozinets,2009)
• Cybergeodemographics– “The analysis of people by where they live and by whom they
interact with, in real and virtual space” (Longley, 2012)
![Page 4: Phd Colloquium Spatial Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022061206/54831f39b4af9ffb6c8b4584/html5/thumbnails/4.jpg)
Uncertainty of Identity: Work Package 4: Cybergeodemographics
• Use of primary and secondary data to relate virtual Internet traffic to the probable physical locations from which it emanated; and the development of typologies of social networks that are robust, generalized and related to physical locations.
Data Collection Tools (WP1)
Text Analytics(WP2)
Cybergeodemographics (WP4)
Secondary Data
![Page 5: Phd Colloquium Spatial Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022061206/54831f39b4af9ffb6c8b4584/html5/thumbnails/5.jpg)
Working Title:
• “Data Mining to Understand International Dimensions to Online Identity - a classification of 2+ billion names and their linkage to virtual identities and social network traffic”
Objectives:
• Develop spatial context of name network classification• Develop typologies of social networks• Measure how representative social media is of the
underlying population.
![Page 6: Phd Colloquium Spatial Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022061206/54831f39b4af9ffb6c8b4584/html5/thumbnails/6.jpg)
Work Plan• M.Res (Present – 2013)
– Foundation work• Assess representative capability of tweet data
– Skills Development• Spatio-Temporal Data Mining• Database Management
• Ph.D (2013 – 2016)
– Objectives• Develop spatial component of names networks• Develop typologies of social networks• Develop a measure of uncertainty
– Completion in August 2016
![Page 7: Phd Colloquium Spatial Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022061206/54831f39b4af9ffb6c8b4584/html5/thumbnails/7.jpg)
Data Sources:
*Sina Weibo
![Page 8: Phd Colloquium Spatial Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022061206/54831f39b4af9ffb6c8b4584/html5/thumbnails/8.jpg)
Case Study: Tweets in London
• 1.4 Million Tweets over 3 months Sep - Dec 2012
![Page 9: Phd Colloquium Spatial Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022061206/54831f39b4af9ffb6c8b4584/html5/thumbnails/9.jpg)
What’s in a Tweet?
First Name
SurnameUnique ID
Popularity
Interactions
# Themes
Possibilities:•Political Affiliation•Gender•Age•Location
Time/Date
Location
![Page 10: Phd Colloquium Spatial Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022061206/54831f39b4af9ffb6c8b4584/html5/thumbnails/10.jpg)
• Gender– Database of 62000 names + genders– Determined by Forename
• Demographic– OAC – Output area classifier
• ONOMAP– Ethnicity, Religion, Geographical Origin.– Determined by Forename Surname combination
Data Classification
![Page 11: Phd Colloquium Spatial Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022061206/54831f39b4af9ffb6c8b4584/html5/thumbnails/11.jpg)
Data Classification
![Page 12: Phd Colloquium Spatial Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022061206/54831f39b4af9ffb6c8b4584/html5/thumbnails/12.jpg)
Tw
eets
by
ON
OM
AP
Rel
igio
n
![Page 13: Phd Colloquium Spatial Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022061206/54831f39b4af9ffb6c8b4584/html5/thumbnails/13.jpg)
Tw
eets
by
ON
OM
AP
Rel
igio
n
![Page 14: Phd Colloquium Spatial Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022061206/54831f39b4af9ffb6c8b4584/html5/thumbnails/14.jpg)
Tw
eets
by
ON
OM
AP
Gro
up
![Page 15: Phd Colloquium Spatial Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022061206/54831f39b4af9ffb6c8b4584/html5/thumbnails/15.jpg)
Challenges of Study
• Signal from Noise– Tweets are not all sent from individuals homes
• Day and night demographics
– Not all location tweets are real people
• Data Quality/Sample Size– Twitter users are self selecting
• Only a small proportion have enabled location services• Dataset currently has 92,000 unique users
![Page 16: Phd Colloquium Spatial Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022061206/54831f39b4af9ffb6c8b4584/html5/thumbnails/16.jpg)
Target Areas of Study
• Spatio-temporal differentiation of tweets– Night– Day– Travel
• Expansion of the Methodology for World Names– Initially into Europe.
• Application of new name datasets.
![Page 17: Phd Colloquium Spatial Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022061206/54831f39b4af9ffb6c8b4584/html5/thumbnails/17.jpg)
References:• Dale, M. R. T., and M-J. Fortin. "From graphs to spatial graphs." Annual Review of Ecology,
Evolution, and Systematics 41.1 (2010): 21.• Fischer, E. (July, 2011). World Map of Flikr and Twitter Locations. In See Something or Say
Something. Available at http://www.flickr.com/photos/walkingsf/5912169471/in/set-72157627140310742
• http://urbantick.blogspot.co.uk/2010/12/ncl-social-networks.html
• Kozinets, Robert V. Netnography: Doing ethnographic research online. Sage Publications Limited, 2009.
• R Core Team (2012). R: A language and environment for statistical computing. R Foundation for
• Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org/.
• Rao, D., Yarowsky, D., Shreevats, A., & Gupta, M. (2010, October). Classifying latent user attributes in twitter. In Proceedings of the 2nd international workshop on Search and mining user-generated contents (pp. 37-44). ACM.
![Page 18: Phd Colloquium Spatial Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022061206/54831f39b4af9ffb6c8b4584/html5/thumbnails/18.jpg)
Thank-you
X Factor GraphProduced with R and Gephi