Location Mining from Online Social Networks

Download Location Mining from Online Social Networks

Post on 27-Jan-2016

36 views

Category:

Documents

0 download

DESCRIPTION

Location Mining from Online Social Networks. Satyen Abrol Advisors: Dr. Latifur Khan Dr. Bhavani Thuraisingham. Location Mining in Online Social Networks. What is the city level home location of a user?. Outline. Introduction and Problem Statement Different Approaches - PowerPoint PPT Presentation

TRANSCRIPT

  • Location Mining from Online Social NetworksSatyen AbrolAdvisors:Dr. Latifur KhanDr. Bhavani Thuraisingham

  • Location Mining in Online Social NetworksWhat is the city level home location of a user?

  • OutlineIntroduction and Problem StatementDifferent ApproachesSocial Graph Based: Our ApproachesTweethood: Fuzzy k Closest Friends with Variable DepthTweecalization: Label Propagation Tweeque: Graph Partitioning for Spatio-Temporal AnalysisExperiments and ResultsFuture Work

  • OutlineIntroduction and Problem StatementDifferent ApproachesSocial Graph Based: Our ApproachesTweethood: Fuzzy k Closest Friends with Variable DepthTweecalization: Label Propagation Tweeque: Graph Partitioning for Spatio-Temporal AnalysisExperiments and ResultsFuture Work

  • Why is Location Important?Privacy and SecurityTrustworthinessLocation Driven Mining for BusinessLocation-Based Social Networking to generate US $21.14 billion by 20151 But only ~14.3% provide it explicitly2

    1 According to New Report by Global Industry Analysts, Inc., (GIA) (http://www.strategyR.com/)2 According to an experiment performed by us on 1 million users

  • Twitter - BasicsTweets:Maximum 140 Characters

    # of Tweets# of Following# of FollowersLocation

  • Why is location so important?

  • Privacy and SecurityLosing locational privacy foreverUsers leave field blank, dont want strangers to know their locationshttp://pleaserobme.com/

  • TrustworthinessCorporate companies use social media for better advertising and marketingIran Elections of 2009US State Department used Twitter as a source

    Trustworthiness is important in such cases

    To be able to trust/verify the correctness of location mentioned in user profile

  • Marketing and BusinessLarge corporations Walmart, Starbucks, United Airlines use social mediaGreat tool for inexpensive advertisingGetting feedback from users

  • The ProblemLeave the location field blank in their Twitter profilesDo not provide valid geographic informationJustin Biebers heart, NON YA BISNESS!!, looking down on u peopleProvide incorrect locations which may actually exist in real worldNothing in Arizona, Little Heaven in Connecticut Provide several locations, difficult to identify the home location CALi b0Y $TuCC iN V3Ga$ California boy stuck in Las Vegas, NV(~35%) enter just country, state, county, etc. and no city level locations1

    B. Hecht, L. Hong, B. Suh, E. H. Chi, Tweets from justin biebers heart: the dynamics of the location field in user profiles, In SIGCHI 11.

  • OutlineIntroduction and Problem StatementDifferent ApproachesSocial Graph Based: Our ApproachesTweethood: Fuzzy k Closest Friends with Variable DepthTweecalization: Label Propagation Tweeque: Graph Partitioning for Spatio-Temporal AnalysisExperiments and ResultsFuture Work

  • Location Prediction in Social NetworksTwo ApproachesContent Based1,2Using Social Graph3,4,5Z. Cheng, J. Caverlee, and K. Lee, You are where you tweet: A content-based approach to geo-locating twitter users. In CIKM 10.B. Hecht, L. Hong, B. Suh, E. H. Chi, Tweets from justin biebers heart: the dynamics of the location field in user profiles, In SIGCHI 11. S. Abrol, L. Khan and B. Thuraisingham,Tweeque: Spatio-Temporal Analysis of Social Networks for Location Mining Using Graph Partitioning, The First ASE/IEEE International Conference on Social Informatics, December 14-16, 2012, Washington D.C., USA.S. Abrol., L. Khan and B. Thuraisingham Tweecalization: Efficient and intelligent location mining in Twitter using semi-supervised learning, 8th IEEE International Conference on Collaborative Computing, October 1417, 2012 Pittsburgh, Pennsylvania.S. Abrol., L. Khan, Agglomerative clustering on fuzzy k-closest friends with variable depth for location mining, The Second IEEE International Conference on Social Computing (SocialCom2010), Aug 20-22, 2010 Minneapolis, Minnesota.

  • Content Based ApproachInaccurate Location in Text not Location of User

    Involves Ambiguity: Paris can meanParis HiltonParis, the capital of FranceParis, a town in TexasSlow Uses NLP/ Machine Learning techniques, searches gazetteers

  • Using Social GraphsBased on Japanese Proverb - When the character of a man is not clear to you, look at his friends.Relationship between geospatial proximity and friendshipUses classical data mining algorithms for more accurate resultsFaster and can be used for real world applications

  • Geospatial Proximity and FriendshipForm 1012 Twitter user pairs and identify geo distance

    Curve follows power law, curve of form a(x+b)-c with exponent of -0.87

  • Graph ConstructionVertices (data points) represents usersEdge represents similarity between two usersDeal with special casesSpammers follow random peopleCelebrities followed by random peopleEdge weight gets abbreviated

  • Defining Edge WeightConsists of two components:Trustworthiness (TW)Mutual Friends (MF)

  • TrustworthinessFraction of friends which have the same label as the user himself

    Intuition: A person who has stayed at the same place all his life will have most friends from same location and hence high trustworthinessLocation : Seattle/WA/USALocation : Seattle/WA/USALocation : Seattle/WA/USALocation : Seattle/WA/USALocation : Seattle/WA/USALocation : Seattle/WA/USAFriendTrustworthiness: 0.6Location:Seattle/WA/USA

  • Mutual FriendsChose number common friends for similarityBetter AccuracyLow Time Complexity

  • Defined as

    Weightij=Max{TW(Ui), TW(Uj)} + (1- ) MFij

    0

  • OutlineIntroduction and Problem StatementDifferent ApproachesSocial Graph Based: Our ApproachesTweethood: Fuzzy k Closest Friends with Variable DepthTweecalization: Label Propagation Tweeque: Graph Partitioning for Spatio-Temporal AnalysisExperiments and ResultsFuture Work

  • Tweethood: Fuzzy k-Closest Friends with Variable DepthChoose k closest friends for the userIf location is not found look further for the answerEach node is defined by a vector having locations with their respective probabilitiesBoost and Aggregate at each step

    Satyen Abrol, Latifur Khan, TweetHood: Agglomerative Clustering on Fuzzy k-Closest Friends with Variable Depth for Location Mining. In Proc. of the Second IEEE International Conference on Social Computing (SocialCom-2010), Minneapolis, USA, August 20-22, 2010

  • Agglomerative ClusteringDont want to find just any locationWant a location or group of locations with some confidenceTradeoff between number of locations, distance between concepts, and total confidenceConstruct matrix at each step with Objective Function of the above attributes. Choose concepts with maximum valuesContinue till we cross threshold

  • Find the location of John Doe

  • Social Network of John DoeCB1CB2CB3CBn

  • Choose k closest friends of John DoeCB1CB2CB3CBk

  • Identify LocationsCB1CB2CB3CBkLocation : NULLLocation : NULLLocation : NULLLocation : Seattle, USALOW ACCURACY

  • What if we have depth=2 ?CB1CB2CB3CBkLocation : Seattle/WA/USALocation : NULLLocation : NULLLocation : Sydney/AULocation : Dallas/TX/USALocation : Richardson/TX/USALocation : NULLLocation : NULLLocation : Dallas/TX/USALocation : NULL

  • CB1CB2CB3CBkDallas/TX/USA0.4Seattle/WA/USA0.2Richardson/TX/USA0.2Sydney/AU0.2Dallas/TX/USA0.33New Delhi/Delhi/India0.33Sunnyvale/CA/USA0.33Austin/TX/USA0.50Minneapolis/MN/USA0.50Plano/TX/USA0.25Boulder/CO/USA0.25Salt Lake City/UT/USA0.25London/London/GB0.25Location Vector for John Does friends

  • Location Vector for John DoeDallas/TX/USA0.1825Seattle/WA/USA0.05Richardson/TX/USA0.05Sydney/AU0.05New Delhi/Delhi/IN0.0825Sunnyvale/CA/USA0.0825Austin/TX/USA0.125Minneapolis/MN/USA0.125Plano/TX/USA0.0625Boulder/CO/USA0.0625Salt Lake City/UT/US0.0625London/GB0.0625

  • Agglomerative ClusteringDallas/TX/USA0.1825Seattle/WA/USA0.05Richardson/TX/USA0.05Sydney/AU0.05New Delhi/Delhi/IN0.0825Sunnyvale/CA/USA0.0825Austin/TX/USA0.125Minneapolis/MN/USA0.125Plano/TX/USA0.0625Boulder/CO/USA0.0625Salt Lake City/UT/US0.0625London/GB0.0625

  • {Dallas, Plano, Richardson}/TX/USA 0.295

    Seattle/WA/USA0.05Sydney/AU0.05New Delhi/Delhi/IN0.0825Sunnyvale/CA/USA0.0825Austin/TX/USA0.125Minneapolis/MN/USA0.125Boulder/CO/USA0.0625Salt Lake City/UT/US0.0625London/GB0.0625Agglomerative Clustering

  • Tweethood: Algorithm

  • OutlineIntroduction and Problem StatementDifferent ApproachesSocial Graph Based: Our ApproachesTweethood: Fuzzy k Closest Friends with Variable DepthTweecalization: Label Propagation Tweeque: Graph Partitioning for Spatio-Temporal AnalysisExperiments and ResultsFuture Work

  • Tweecalization: Label PropagationBut the availability of users with location is limited Most of users do not have a locationNeed a method that can learn from unlabeled data

    Satyen Abrol, Latifur Khan and Bhavani Thuraisingham, Tweecalization: Efficient and Intelligent location mining in Twitter using semi- supervised learning, 8th IEEE International Conference on Collaborative Computing, October 1417, 2012, Pittsburgh, Pennsylvania

  • Tweecalization: Label PropagationIdeal scenario for semi supervised learning: Only a few friends with locations(labeled data)1Use both labeled and unlabeleddata for trainingPoints which are close to each other are more likely to share a label

    Y. Bengio, O. Dellalleau, and N. L. Roux, Label propagation and quadratic criterion, In O. Chapelle, B. Schlkopf and A. Zien (Eds.), Semi-supervised learning. MIT Press, 2006.

  • Label Propagation: An Illustration?Central UserFriends with locationFriends without locationCLAMPED LOCATIONS

  • Tweecalization: Algorithm

  • OutlineIntroduction and Problem StatementDifferent ApproachesSocial Graph Based: Our ApproachesTweethood: Fuzzy k Closest Friends with Variable DepthTweecalization: Label Propagation Tweeque: Graph Partitioning for Spatio-Temporal AnalysisExperiments and ResultsFuture Work

  • What About Temporal Analysis? None of the existing works do temporal analysisWhat about migration/ geographical mobility?

  • Migration/Geographical Mobility

    4% to 6% every year, means 12 to 17 million each year

    United States Census Bureau - Geographical Mobility/Migration Data - http://www.census.gov/hhes/migration/

  • Migration/Geographical MobilityMigration as a function of age

    People aged 20-29 have a higher probability to move

    High Migration Rate: College and JobsLow Migration Rate: Old age, people settle downUnited States Census Bureau - Geographical Mobility/Migration Data - http://www.census.gov/hhes/migration/

  • Facebook Users and MobilityLet us look at the cumulative effect

    Only 28% to 37% are currently living in their hometownBased on our experiments on 300k Public Facebook Profiles

  • Twitter Users and MobilityLinking Twitter users to migration

    33% of all Twitter users are aged 25-34 years

    Based on our findings by [1]ABI Research. Online. Available: http://www.abiresearch.com

  • Tweeque: Graph PartitioningHow do we know if this is the current location for a user?How do we perform temporal analysis of friendships?Propose a technique that indirectly infers the current location

    Satyen Abrol, Latifur Khan and Bhavani Thuraisingham,Tweeque: Spatio-Temporal Analysis of Social Networks for Location Mining Using Graph Partitioning, The First ASE/IEEE International Conference on Social Informatics, December 14-16, 2012, Washington D.C., USA.

  • Observation 1: Social Cliques and LocationOur definition: A social clique is an inclusive group of people that share friendshipApart from friendship, what is the attribute that links members of a clique? Individual LocationsAll members of a clique were or are at a particular geographical location at a particular instant of time like college, school, a company, etc.

  • As shown previously over course of time, people have tendency to migrateBased on these two observations we hypothesize If we can divide the social graph of a particular user into cliques and check for location based purity of the cliques, we can accurately separate out his current location from previous locations. Migration is our latent time factor

    Observation 2: Migration and Time

  • Tweeque: An exampleFriends from high school in DallasFriends from college in BostonRelatives/CousinsFriends from job in Seattle

  • Tweeque: An exampleAll Friends of the User

  • Tweeque: An exampleSocial Clique #1 (High School)Social Clique #2 (College)Social Clique #3 (Current Work)Social Clique #4 (Relatives)

  • Tweeque: An ExampleDallas/TX/USASeattle/WA/USADallas/TX/USASan Diego/CA/USANew York/NY/USABoston/MA/USAPortland/OR/USAAustin/TX/USABoston/MA/USADallas/TX/USASingaporeSydney/AustraliaDallas/TX/USADallas/TX/USAOntario/CanadaSeattle/WA/USASeattle/WA/USADallas/TX/USASeattle/WA/USARedmond/WA/USAHigh SchoolCollegeRelativesWorkPurity (Dallas) = 0.32Purity (Boston) = 0.45Purity (Dallas) = 0.18Purity (Seattle) = 0.69

  • Tweeque: Graph Partitioning

  • Tweeque: Graph PartitioningJ. Shi and J. Malik, Normalized Cuts and Image Segmentation, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 888-905, Aug. 2000.

  • Tweeque: Graph Partitioning

  • Tweeque: Algorithm

  • Tweeque: Purity Voting

  • OutlineIntroduction and Problem StatementDifferent ApproachesSocial Graph Based: Our ApproachesTweethood: Fuzzy k Closest Friends with Variable DepthTweecalization: Label Propagation Tweeque: Graph Partitioning for Spatio-Temporal AnalysisExperiments and ResultsFuture Work

  • Experiment DataRandomly choose 1000 Twitter users

  • Experiments and ResultsWe observe that the accuracy saturates after depth 4Six degrees of separationis the idea that everyone is on average approximately six steps away, by way of introduction, from any other person in the world`For Twitter this distance is found to be 4.67

  • Comparison of Different ApproachesSatyen Abrol, Latifur Khan, TweetHood: Agglomerative Clustering on Fuzzy k-Closest Friends with Variable Depth for Location Mining. In Proc. of the Second IEEE International Conference on Social Computing (SocialCom-2010), Minneapolis, USA, August 20-22, 2010 (Nominated for best paper award, Acceptance Rate:13%)Satyen Abrol, Latifur Khan and Bhavani Thuraisingham, Tweecalization: Efficient and Intelligent location mining in Twitter using semi- supervised learning, 8th IEEE International Conference on Collaborative Computing, October 1417, 2012, Pittsburgh, PennsylvaniaSatyen Abrol, Latifur Khan and Bhavani Thuraisingham,Tweeque: Spatio-Temporal Analysis of Social Networks for Location Mining Using Graph Partitioning, The First ASE/IEEE International Conference on Social Informatics, December 14-16, 2012, Washington D.C., USA.Z. Cheng, J. Caverlee, and K. Lee, You are where you tweet: A content-based approach to geo-locating twitter users. In CIKM 10.

    Tweethood1Tweecalization2Tweeque3Content Based4Accuracy (City)72.1%75.5%76.3%35.6% - 51%Accuracy (Country)80.1%80.1%84.9%52.3%ComplexityO(n)O(n3)O(n3)N/ATemporal AnalysisNoNoYesYes

  • OutlineIntroduction and Problem StatementDifferent ApproachesSocial Graph Based: Our ApproachesTweethood: Fuzzy k Closest Friends with Variable DepthTweecalization: Label Propagation Tweeque: Graph Partitioning for Spatio-Temporal AnalysisExperiments and ResultsFuture Work

  • ContributionsDeveloped three graph based location mining algorithms for online social networksMaps location mining probl...