ainl 2016: moskvichev
TRANSCRIPT
Data Augmentation Method for the Image Sentiment
AnalysisAlexander Rakovsky1, Arseny Moskvichev2, Andrey Filchenkov1
1ITMO UniversitySaint Petersburg, Russia
2Saint Petersburg State UniversitySaint Petersburg, Russia
Image sentiment analysis
Positiveness: 0.9
Positiveness: 0.01
Why is it important?
Two words:
Social networks
How do we approach it?
1.Collect lots of labeled images2.Train a convolutional neural network3.???4.Profit
How do we approach it?
1.Collect lots of labeled images 2.Train a convolutional neural network3.???4.Profit
Problem!
Solution
Data augmentation.
1.Get a few manually labeled images with corresponding hashtags
2.Learn to reconstruct labels from hashtags3.Collect as much labeled data as you need!
Details
• Collecting data through FLICKR API (using keywords)
• Assessors evaluate the emotional colouring (positiveness) of each image
• Converting hashtags to vector representation (word2vec), and averaging them
• Using machine learning to predict assessors’ estimation
(Preliminary!) Results
• kNN accuracy on classification task: 0.95• Average correlation between assessors: 0.86• Between the kNN regression and assessors:
0.83• Using this algorithm is almost as good as
hiring one more assessor!• Suspiciously good...
Details
• Collecting data through FLICKR API (using keywords)
• Assessors evaluate the emotional colouring (positiveness) of each image
• Converting hashtags to vector representation (word2vec), and averaging them
• Using machine learning to predict assessors’ estimation
Nonrepresentative sample!
Pros
• Easy to use (no word preprocessing)• Good results* (compared to dictionary -
based solutions)
Cons
• Needs pre-training and an initial manually labeled sample
Conclusions
• The proposed method affords a simple and efficient hashtag-based data augmentation solution for image sentiment analysis.
• More work is to be done to estimate the method’s performance on a general set of images.
Thank you!