event identification in social media
DESCRIPTION
EVENT IDENTIFICATION IN SOCIAL MEDIA. Hila Becker, Luis Gravano Mor Naaman Columbia University Rutgers University. Social Media Sites Host Many “Event” Documents. “Event”= something that occurs at a certain time in a certain place [Yang et al. ’99] - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: EVENT IDENTIFICATION IN SOCIAL MEDIA](https://reader035.vdocuments.mx/reader035/viewer/2022062519/56814de0550346895dbb4c0a/html5/thumbnails/1.jpg)
EVENT IDENTIFICATION IN SOCIAL MEDIA Hila Becker, Luis Gravano Mor Naaman Columbia University Rutgers University
![Page 2: EVENT IDENTIFICATION IN SOCIAL MEDIA](https://reader035.vdocuments.mx/reader035/viewer/2022062519/56814de0550346895dbb4c0a/html5/thumbnails/2.jpg)
Social Media Sites Host Many “Event” Documents
Photo-sharing: Flickr Video-sharing: YouTube Social networking: Facebook
2
“Event”= something that occurs at a certain time in a certain place [Yang et al. ’99]
Popular, widely known eventsPresidential Inauguration, Thanksgiving Day Parade
Smaller events, without traditional news coverageLocal food drive, street fair
…
Social media documents for “All Points West” festival, Liberty State Park, New
Jersey, 8/8/08
Social media documents for “All Points West” festival, Liberty State Park, New
Jersey, 8/8/08
![Page 3: EVENT IDENTIFICATION IN SOCIAL MEDIA](https://reader035.vdocuments.mx/reader035/viewer/2022062519/56814de0550346895dbb4c0a/html5/thumbnails/3.jpg)
Identifying Events and Associated Social Media Documents
Applications Event search and browsing Local search …
3
General approach: group similar documents via clusteringEach cluster corresponds to one event and its associated social media documents
![Page 4: EVENT IDENTIFICATION IN SOCIAL MEDIA](https://reader035.vdocuments.mx/reader035/viewer/2022062519/56814de0550346895dbb4c0a/html5/thumbnails/4.jpg)
Event Identification: Challenges
Uneven data quality Missing, short, uninformative text … but revealing structured context
available: tags, date/time, geo-coordinates Scalability Dynamic data stream of event
information Unknown number of events
Necessary for many clustering algorithms Difficult to estimate
4
![Page 5: EVENT IDENTIFICATION IN SOCIAL MEDIA](https://reader035.vdocuments.mx/reader035/viewer/2022062519/56814de0550346895dbb4c0a/html5/thumbnails/5.jpg)
Clustering Social Media Documents Social media document
representation Social media document similarity Social media document clustering
Clustering task: definition Ensemble algorithm: combining
multiple clustering results Preliminary evaluation
5
![Page 6: EVENT IDENTIFICATION IN SOCIAL MEDIA](https://reader035.vdocuments.mx/reader035/viewer/2022062519/56814de0550346895dbb4c0a/html5/thumbnails/6.jpg)
Social Media Document Representation
TitleTitle
Description
Description
TagsTags
Date/TimeDate/Time
LocationLocation
All-TextAll-Text
6
![Page 7: EVENT IDENTIFICATION IN SOCIAL MEDIA](https://reader035.vdocuments.mx/reader035/viewer/2022062519/56814de0550346895dbb4c0a/html5/thumbnails/7.jpg)
Social Media Document Similarity
Text: tf-idf weights, cosine similarity
7
TitleTitle
Description
Description
TagsTags
Date/TimeDate/Time
LocationLocation
All-TextAll-Text
TitleTitle
Description
Description
TagsTags
Date/Time-
Keywords
Date/Time-
Keywords
Location-ProximityLocation-Proximity
All-TextAll-Text
Location-KeywordsLocation-Keywords
Date/Time-
Proximity
Date/Time-
Proximity
time
Location: geo-coordinate proximity
AA AAAA BB BBBB
Time: proximity in minutes
![Page 8: EVENT IDENTIFICATION IN SOCIAL MEDIA](https://reader035.vdocuments.mx/reader035/viewer/2022062519/56814de0550346895dbb4c0a/html5/thumbnails/8.jpg)
Social Media Document Clustering Framework
Document featurerepresentation
Social mediadocuments
Event clusters
8
![Page 9: EVENT IDENTIFICATION IN SOCIAL MEDIA](https://reader035.vdocuments.mx/reader035/viewer/2022062519/56814de0550346895dbb4c0a/html5/thumbnails/9.jpg)
Consensus Function:combine ensemble similarities
Consensus Function:combine ensemble similarities
Clustering: Ensemble Algorithm
Wtitle
Wtags
Wtime
9
f(C,W)f(C,W)
Ctitle
Ctags
Ctime
Ensemble clustering solution
Ensemble clustering solution
Learned in a training step
Learned in a training step
![Page 10: EVENT IDENTIFICATION IN SOCIAL MEDIA](https://reader035.vdocuments.mx/reader035/viewer/2022062519/56814de0550346895dbb4c0a/html5/thumbnails/10.jpg)
Clustering: Measuring Quality Homogeneous clusters
10
✔
✔
Complete clusters
Metric: Normalized Mutual Information (NMI)Shared information between clustering solution and “ground truth”
![Page 11: EVENT IDENTIFICATION IN SOCIAL MEDIA](https://reader035.vdocuments.mx/reader035/viewer/2022062519/56814de0550346895dbb4c0a/html5/thumbnails/11.jpg)
Experimental Setup
Data: >270K Flickr photos Event labels from Yahoo!’s “upcoming” event
database Split into 3 parts for training/validation/testing
Clusterers: single pass algorithm with centroid similarity
Weighing scheme: Normalized Mutual Information (NMI) scores on validation set
Consensus function: weighted average of clusterers’ binary predictions
Final prediction step: single pass clustering algorithm
11
![Page 12: EVENT IDENTIFICATION IN SOCIAL MEDIA](https://reader035.vdocuments.mx/reader035/viewer/2022062519/56814de0550346895dbb4c0a/html5/thumbnails/12.jpg)
Preliminary Evaluation Results Individual clusterer performance
Highest NMI: Tags, All-Text Lowest NMI: Description, Title
Ensemble performance, compared against all individual clusterers Highest overall performance in terms of
NMI More homogenous clusters: each event
is spread over fewer clusters
12
Details in paper
Details in paper
![Page 13: EVENT IDENTIFICATION IN SOCIAL MEDIA](https://reader035.vdocuments.mx/reader035/viewer/2022062519/56814de0550346895dbb4c0a/html5/thumbnails/13.jpg)
Document similarity metric Ensemble approach
Weight assignment Choice of clusterers
Train a classifier to predict document similarity Features correspond to similarity scores
All-text, title, tags, time, location, etc. Numeric values in [0,1]
State-of-the-art classifiers: SVM, Logistic Regression, …
13
Future Work: Alternative Choices
![Page 14: EVENT IDENTIFICATION IN SOCIAL MEDIA](https://reader035.vdocuments.mx/reader035/viewer/2022062519/56814de0550346895dbb4c0a/html5/thumbnails/14.jpg)
Future Work: Alternative Choices
Final clustering step Apply graph partitioning algorithms
Requires estimating the number of clusters Evaluation metrics: beyond NMI Datasets
Flickr LastFM, YouTube Exploit social network connections
14
![Page 15: EVENT IDENTIFICATION IN SOCIAL MEDIA](https://reader035.vdocuments.mx/reader035/viewer/2022062519/56814de0550346895dbb4c0a/html5/thumbnails/15.jpg)
Conclusions
Identified events and their corresponding social media documents Proposed a clustering solution Leveraged different representations of social media
documents Employed various social media similarity metrics
Developed a weighted ensemble clustering approach
Reported preliminary results of our event identification approach on a large-scale dataset of Flickr photographs
15