ainl 2016: moskvichev

13
Data Augmentation Method for the Image Sentiment Analysis Alexander Rakovsky 1 , Arseny Moskvichev 2 , Andrey Filchenkov 1 1 ITMO University Saint Petersburg, Russia 2 Saint Petersburg State University Saint Petersburg, Russia [email protected], [email protected], [email protected]

Upload: lidia-pivovarova

Post on 15-Apr-2017

208 views

Category:

Science


0 download

TRANSCRIPT

Page 1: AINL 2016: Moskvichev

Data Augmentation Method for the Image Sentiment

AnalysisAlexander Rakovsky1, Arseny Moskvichev2, Andrey Filchenkov1

1ITMO UniversitySaint Petersburg, Russia

2Saint Petersburg State UniversitySaint Petersburg, Russia

[email protected], [email protected], [email protected]

Page 2: AINL 2016: Moskvichev

Image sentiment analysis

Positiveness: 0.9

Positiveness: 0.01

Page 3: AINL 2016: Moskvichev

Why is it important?

Two words:

Social networks

Page 4: AINL 2016: Moskvichev

How do we approach it?

1.Collect lots of labeled images2.Train a convolutional neural network3.???4.Profit

Page 5: AINL 2016: Moskvichev

How do we approach it?

1.Collect lots of labeled images 2.Train a convolutional neural network3.???4.Profit

Problem!

Page 6: AINL 2016: Moskvichev

Solution

Data augmentation.

1.Get a few manually labeled images with corresponding hashtags

2.Learn to reconstruct labels from hashtags3.Collect as much labeled data as you need!

Page 7: AINL 2016: Moskvichev

Details

• Collecting data through FLICKR API (using keywords)

• Assessors evaluate the emotional colouring (positiveness) of each image

• Converting hashtags to vector representation (word2vec), and averaging them

• Using machine learning to predict assessors’ estimation

Page 8: AINL 2016: Moskvichev

(Preliminary!) Results

• kNN accuracy on classification task: 0.95• Average correlation between assessors: 0.86• Between the kNN regression and assessors:

0.83• Using this algorithm is almost as good as

hiring one more assessor!• Suspiciously good...

Page 9: AINL 2016: Moskvichev

Details

• Collecting data through FLICKR API (using keywords)

• Assessors evaluate the emotional colouring (positiveness) of each image

• Converting hashtags to vector representation (word2vec), and averaging them

• Using machine learning to predict assessors’ estimation

Nonrepresentative sample!

Page 10: AINL 2016: Moskvichev

Pros

• Easy to use (no word preprocessing)• Good results* (compared to dictionary -

based solutions)

Page 11: AINL 2016: Moskvichev

Cons

• Needs pre-training and an initial manually labeled sample

Page 12: AINL 2016: Moskvichev

Conclusions

• The proposed method affords a simple and efficient hashtag-based data augmentation solution for image sentiment analysis.

• More work is to be done to estimate the method’s performance on a general set of images.

Page 13: AINL 2016: Moskvichev

Thank you!