mediarevealr: a social multimedia monitoring and intelligence system for web multimedia verication

28
Media REVEALr: A social multimedia monitoring and intelligence system for Web multimedia verification Katerina Andreadou 1 , Symeon Papadopoulos 1 , Lazaros Apostolidis 1 , Anastasia Krithara 2 and Yiannis Kompatsiaris 1 , 1 Centre for Research and Technology Hellas (CERTH) – Information Technologies Institute (ITI) PAISI 2015, May 19, 2015, Ho Chi Minh City, Vietnam

Upload: reveal-social-media-verification

Post on 02-Aug-2015

252 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Mediarevealr: A social multimedia monitoring and intelligence system for Web multimedia verication

Media REVEALr: A social multimedia monitoring and intelligence system for Web multimedia verificationKaterina Andreadou1, Symeon Papadopoulos1, Lazaros Apostolidis1, Anastasia Krithara2 and Yiannis Kompatsiaris1,

1Centre for Research and Technology Hellas (CERTH) – Information Technologies Institute (ITI)

2National Centre for Scientific Research ‘Demokritos’ (NCSR ’D’)

PAISI 2015, May 19, 2015, Ho Chi Minh City, Vietnam

Page 2: Mediarevealr: A social multimedia monitoring and intelligence system for Web multimedia verication

Can multimedia on the Web be trusted?

#2

Real photocaptured April 2011 by WSJbutheavily tweeted during Hurricane Sandy(29 Oct 2012)

Tweeted by multiple sources & retweeted multiple times

Original online at:http://blogs.wsj.com/metropolis/2011/04/28/weather-journal-clouds-gathered-but-no-tornado-damage/

Page 3: Mediarevealr: A social multimedia monitoring and intelligence system for Web multimedia verication

The Problem

• Everyone can easily publish content on the Web• Content can be easily repurposed and manipulated• News outlets are competing for views and clicks

Pressure for airing stories very quickly leaves very little room for verification. Very often, even well-reputed news providers fall for fake news content.

• Multiple tools and services available for individual tasks complex verification process

Very hard and time consuming to check the veracity of Web multimedia

#3

Page 4: Mediarevealr: A social multimedia monitoring and intelligence system for Web multimedia verication

Media REVEALr

• Developed within the REVEAL project: http://revealproject.eu/

• Framework for collecting, indexing and browsing multimedia content from the Web and social media

• Support for verification:– Near-duplicate detection against an indexed collection– Clustering of social media posts by visual similarity

comparative view of the same incident– Aggregation and visualization of Named Entities around an

incident

#4

Page 5: Mediarevealr: A social multimedia monitoring and intelligence system for Web multimedia verication

Related Work

• Majority of works have focused on problem of topic detection and summarization:– TwitInfo (Marcus et al., 2011)– Twittermonitor (Mathioudakis & Koudas, 2010)– Meme detection & prediction (Weng et al., 2014)

• Visual memes and clustering– Visual meme tracking (Xie et al., 2011)– Supervised multimodal clustering (Petkos et al., 2012)

• Image manipulation tracking– Internet image archaeology (Kennedy & Chang, 2008)

#5

Page 6: Mediarevealr: A social multimedia monitoring and intelligence system for Web multimedia verication

Overview of Media REVEALr

#6

Media collection

Media pre-processing & feature extraction

Media analysis, mining & indexing

Persistence

Access (API)

Visualization, front-end

TEXT VISUAL

Page 7: Mediarevealr: A social multimedia monitoring and intelligence system for Web multimedia verication

Named Entity Detection

• Brevity and noisy nature of text in social media poses a serious challenge

• Employed solution:– Pre-processing: tokenization, user mention resolution, text

cleaning– Stanford NER + user mention resolution– Regular expressions to remove special characters and

symbols (e.g., #, @, URLs, etc.)

#7

Page 8: Mediarevealr: A social multimedia monitoring and intelligence system for Web multimedia verication

Visual Indexing

• Content-based image retrieval to solve Near-Duplicate Search (NDS) problem

• Based on local descriptors (SURF), aggregation (VLAD), dimensionality reduction (PCA), quantization (PQ) and indexing (IVFADC)

• State-of-the-art visual similarity search– High precision/recall– Very efficient and scalable implementation (search many

millions of images in a few msec, maintain full index in memory using ~1GB/10M images)

#8

Page 9: Mediarevealr: A social multimedia monitoring and intelligence system for Web multimedia verication

Improving NDS Resilience (NDS+)

• Often, NDS performance suffers from overlay graphics and fonts

• To address this issue, we integrate a descriptor-level classifier that tries to remove the font/graphic descriptors from the VLAD vector

#9

Page 10: Mediarevealr: A social multimedia monitoring and intelligence system for Web multimedia verication

Example: Filtering Out Font Descriptors

• Assuming that in most cases the classifier is correct, the resulting VLAD vector is of much higher quality compared to the one without filtering

#10

Page 11: Mediarevealr: A social multimedia monitoring and intelligence system for Web multimedia verication

Classifier Details

• Random Forest used as base classifier• Cost Sensitive meta-classifier to penalize

misclassification of True Positives• Challenge due to Class Imbalance (overlay

descriptors << useful image content descriptors)– Cost Sensitive meta-classifier performs over-sampling of

minority class to balance the training set• Training set created by collecting images with

overlays (e.g. memes) from the Web and manually annotating them (selecting areas w. fonts/overlays)

#11

Page 12: Mediarevealr: A social multimedia monitoring and intelligence system for Web multimedia verication

Mining: Clustering and Aggregation

• Visual aggregation– DBSCAN on the visual feature representation (PCA-reduced

VLAD vectors)– Element (tweet) selected based on the largest amount of

keywords (expected to result in more information)

• Entity aggregation– NER on individual items– Entity categorization ( Persons, Location, Organizations)– Entity ranking based on frequency of occurrence

#12

Page 13: Mediarevealr: A social multimedia monitoring and intelligence system for Web multimedia verication

User Interface: Collections View

#13

Page 14: Mediarevealr: A social multimedia monitoring and intelligence system for Web multimedia verication

User Interface: Items View & Search

#14

Page 15: Mediarevealr: A social multimedia monitoring and intelligence system for Web multimedia verication

User Interface: Clusters View

#15

Page 16: Mediarevealr: A social multimedia monitoring and intelligence system for Web multimedia verication

User Interface: Entities View

#16

Page 17: Mediarevealr: A social multimedia monitoring and intelligence system for Web multimedia verication

Evaluation: NER

• Manual annotation of 400 tweets from the SNOW Data Challenge dataset (Papadopoulos et al., 2014)

• Measure: Accuracy instance is considered correct when both entity and type are correctly identified

• Three competing solutions: – Base Stanford NER (S-NER)– S-NER + Extensions/Post-processiong (S-NER+)– Ellogon library (http://www.ellogon.org)

#17

Page 18: Mediarevealr: A social multimedia monitoring and intelligence system for Web multimedia verication

Evaluation: NDS

• Benchmark Datasets– Holidays: 1,491 images, 500 queries (Jegou et al., 2008)– Oxford: 5,063 images, 55 queries (Philbin et al., 2008)– Paris: 6,412 images, 55 queries (Philbin et al., 2008)

• Accuracy: mean Average Precision (mAP)

#18

CLEAN DATASET NOISY DATASET

Page 19: Mediarevealr: A social multimedia monitoring and intelligence system for Web multimedia verication

Evaluation: NDS

• Execution Time (msec)

• Example

#19

INDEXED IMAGEQUERY IMAGE

NDS: #27NDS+: #1

Page 20: Mediarevealr: A social multimedia monitoring and intelligence system for Web multimedia verication

Use Cases: Real-world Datasets

#20

sandy boston malaysia ferry

Page 21: Mediarevealr: A social multimedia monitoring and intelligence system for Web multimedia verication

NDS Use Case (boston)

#21

Page 22: Mediarevealr: A social multimedia monitoring and intelligence system for Web multimedia verication

Clustering Use Case (boston)

• Visual clustering enables comparative view and analysis over time (in this case showing increasing confidence on picture).

• When journalists see many similar photos of the same scene, they have more confidence that it is real and not fabricated.

#22

Page 23: Mediarevealr: A social multimedia monitoring and intelligence system for Web multimedia verication

Entity Aggregation Use Case (snow)

#23

LOCATIONS PERSONS ORGANIZATIONS

Page 24: Mediarevealr: A social multimedia monitoring and intelligence system for Web multimedia verication

Conclusion

• Key contributions– Framework and web application offering valuable

verification support for Web multimedia– High-quality individual components for NER, NDS,

clustering and aggregation

• Future Work– Incremental image clustering– Temporal views to explore evolution of a story– Multimedia forensics toolbox (splice, copy-move

detection)

#24

Page 25: Mediarevealr: A social multimedia monitoring and intelligence system for Web multimedia verication

Future Work: Web Multimedia Forensics

• Possibility to offer image manipulation detection as a service for arbitrary Web images– challenges: social media platforms incur additional

transformations (scaling, JPEG recompression, etc.) making the problem much more complex

#25

Page 26: Mediarevealr: A social multimedia monitoring and intelligence system for Web multimedia verication

References (1/2)

• A. Marcus, M. S. Bernstein, O. Badar, D. R. Karger, S. Madden, and R. C. Miller. Twitinfo: Aggregating and visualizing microblogs for event exploration. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI '11, pages 227-236, New York, NY, USA, 2011. ACM

• M. Mathioudakis and N. Koudas. Twittermonitor: Trend detection over the twitter stream. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD '10, pages 1155-1158, New York, NY, USA, 2010. ACM

• G. Petkos, S. Papadopoulos, and Y. Kompatsiaris. Social event detection using multimodal clustering and integrating supervisory signals. In Proceedings of the 2Nd ACM International Conference on Multimedia Retrieval, ICMR '12, pages 23:1-23:8, New York, NY, USA, 2012. ACM

• L. Weng, F. Menczer, and Y. Ahn. Predicting successful memes using network and community structure. CoRR, abs/1403.6199, 2014

• L. Xie, A. Natsev, J. R. Kender, M. Hill, and J. R. Smith. Visual memes in social media: Tracking real-world news in youtube videos. In Proceedings of the 19 th ACM International Conference on Multimedia, MM '11, pages 53{62, New York, NY, USA, 2011. ACM

#26

Page 27: Mediarevealr: A social multimedia monitoring and intelligence system for Web multimedia verication

References (2/2)

• L. Kennedy and S.-F. Chang. Internet image archaeology: Automatically tracing the manipulation history of photographs on the web. In Proceedings of the 16th ACM International Conference on Multimedia, MM '08, pages 349{358, New York, NY, USA, 2008. ACM

• H. Jegou, M. Douze, and C. Schmid. Hamming embedding and weak geometric consistency for large scale image search. In Proceedings of the 10th European Conference on Computer Vision: Part I, ECCV '08, pages 304-317, Berlin, Heidelberg, 2008. Springer-Verlag

• S. Papadopoulos, D. Corney, and L. M. Aiello. SNOW 2014 Data Challenge: Assessing the performance of news topic detection methods in social media. In Proceedings of the SNOW 2014 Data Challenge Workshop co-located with 23rd International World Wide Web Conference (WWW 2014), Seoul, Korea, April 8, 2014, pages 1-8, 2014.

• J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Lost in quantization: Improving particular object retrieval in large scale image databases. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2008), pages 1-8, June 2008.

#27

Page 28: Mediarevealr: A social multimedia monitoring and intelligence system for Web multimedia verication

Thank you!

• Resources:Slides: http://www.slideshare.net/sympapadopoulos/mediarevealrCode: https://github.com/MKLab-ITI/reveal-media-crawler

https://github.com/MKLab-ITI/multimedia-indexingData: https://github.com/MKLab-ITI/image-verification-corpus

• Get in touch:@sympapadopoulos / [email protected]@kandreads / [email protected]

#28