paper presentation - home | computer science at ubclsigal/532l/pp_towards... · paper presentation...

68
Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et. al.

Upload: others

Post on 07-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

Paper presentation

Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et. al.

Page 2: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

Paper presentation

Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et. al.

Page 3: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

Has the problem of generating image descriptions

been solved?

Page 4: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

Problem Statement● Image captioning with a focus on fidelity, naturalness, and diversity

Captions generated by 3 different methods for the given image on the left side. [1]

Page 5: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

Motivation● Less variable and rigid captions produced by the state of the art methods

○ High resemblance captions to the ground truth● Unavailable metrics to capture variability and naturalness

○ Some of the current state of the art metrics:■ BLEU■ METEOR

Page 6: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

Related Work● Generation

○ Detection-based approaches■ CRF, SVM, CNN■ Retrieving sentences from existing data or using sentence templates

○ Encoder-decoder paradigm■ Maximum likelihood

Page 7: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

Related Work● Evaluation

○ Classical metrics■ BLEU: Precision■ ROUGE: Recall of n-grams

○ METEOR■ Combination of precision and recall

○ CIDEr■ Weighted statistics

○ SPICE■ Focus on linguistic entities reflecting visual concepts

Page 8: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

What is the state of the art model for image captioning?

Page 9: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

State of the art

LSTM based model for image captioning. [2]

● Combination of LSTM and CNN

● Metrics○ CIDEr○ METEOR○ BLEU○ ROUGE

Page 10: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

State of the art

Image description generation and evaluation for the state of the art approaches. [1]

Page 11: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

State of the art

Examples of images with two semantically similar descriptions. [1]

Page 12: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

Background● Conditional GAN

○ Conditioned on some extra information y.

Conditional adversarial net. [3]

Page 13: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

Background● Sequential Sampling

○ Non-probability sampling technique■ Not equal chances of being selected for all members of the population

○ Picking a single or a group of objects in every time interval○ Analyze the result, pick another sample

● Monte Carlo tree search○ A heuristic search algorithm for some kinds of decision processes esp. Game plays○ Steps

■ Selection■ Expansion■ Simulation■ Backpropagation

Page 14: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

Proposed Approach

Overall structure of both generator and evaluator of a CGAN. [1]

Page 15: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

Overall formulation● Evaluator

○ Quality of descriptions

● Learning objective

Page 16: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

The production of sentences is non-differentiable. How can

we backpropagate?

Page 17: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

Challenges● Using policy gradient for generating linguistic description

○ Sequential sampling procedure■ Sampling a discrete token at each time step

○ Non-differentiable● Expected future reward for early feedback using Monte Carlo rollouts

○ Vanishing gradients○ Error propagation

Page 18: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

Training G● Policy gradient

○ Action space: words○ Conditional policy

○ Reward given by the evaluator for a sequence of actions S

Page 19: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

Training G● Early feedback

○ Expected future reward

○ Gradient of the objective w.r.t.

Page 20: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

Training E● Enforcing naturalness and semantic relevance

○ Set of descriptions provided by human○ Descriptions from the generator○ Human descriptions uniformly sampled from other descriptions not associated with the given

image

Page 21: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

Generating paragraphs● Hierarchical LSTM

○ Sentence level○ Word level

Adding extension for paragraph generation. [1]

Page 22: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

Experiment● Datasets

○ MSCOCO○ Flickr30k

● Settings○ Removing non-alphabet characters○ Converting to lowercase○ Replacing less frequent words, less than 5 times, with UNK○ Max length of 16○ Pretrain G for 20 epochs based on MLE○ Pretrain E with supervised training for 5 epochs○ Batch = 64, lr = 0.0001, n = 16 (Monte Carlo)

Page 23: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

Experiment● Performance

Performances of different generators on MSCOCO and Flickr30k. [1]

Page 24: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

Has the proposed approach been able to achieve more

natural descriptions?

Page 25: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

Experiment● User Study

Human comparison results between each pair of generators. [1]

Page 26: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

Experiment● User Study

Corresponding G-GAN captions for images with similar descriptions in G-MLE. [1]

Page 27: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

Experiment● Diverse descriptions

Generated descriptions with different z. [1]

Page 28: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

Experiment● Evaluating semantic relevance by retrieval

Image rankings for different generators. [1]

S → E-GANP → Log-likelihood

Page 29: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

Experiment● Paragraph generation with different z values

Different paragraph descriptions generated by human, G-GAN, and G-MLE with different z values. [1]

Page 30: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

Failure Analysis● Incorrect details

○ Colors○ Counts

■ Few samples for each special detail■ Increased risk of putting more incorrect details due to focus on diversity

Page 31: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

Potential extensions?

Page 32: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

Future works● Use VAE instead of GAN● Use other similarity metrics instead of dot product in evaluator

Page 33: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

Presenters● Kevin Dsouza● Ainaz Hajimoradlou

Page 34: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

References1. B. Dai, S. Fidler, R. Urtasun, and D. Lin, Towards Diverse and Natural Image

Descriptions via a Conditional GAN, 2017, arXiv preprint arXiv: 1703.060292. O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, Show and Tell: Lessons

learned from the 2015 MSCOCO Image Captioning Challenge, arXiv preprint arXiv: 1609.06647

3. M. Mirza, S. Osindero, Conditional Generative Adversarial Nets, arXiv preprint arXiv: 1411.1784

Page 35: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

Thank you.

Page 36: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

Has the problem of generating image descriptions

been solved?

Page 37: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

Problem Statement● Image captioning with a focus on fidelity, naturalness, and diversity

Captions generated by 3 different methods for the given image on the left side. [1]

Page 38: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

Motivation● Less variable and rigid captions produced by the state of the art methods

○ High resemblance captions to the ground truth● Unavailable metrics to capture variability and naturalness

○ Some of the current state of the art metrics:■ BLEU■ METEOR

Page 39: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

Related Work● Generation

○ Detection-based approaches■ CRF, SVM, CNN■ Retrieving sentences from existing data or using sentence templates

○ Encoder-decoder paradigm■ Maximum likelihood

Page 40: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

Related Work● Evaluation

○ Classical metrics■ BLEU: Precision■ ROUGE: Recall of n-grams

○ METEOR■ Combination of precision and recall

○ CIDEr■ Weighted statistics

○ SPICE■ Focus on linguistic entities reflecting visual concepts

Page 41: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

What is the state of the art model for image captioning?

Page 42: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

State of the art

LSTM based model for image captioning. [2]

● Combination of LSTM and CNN

● Metrics○ CIDEr○ METEOR○ BLEU○ ROUGE

Page 43: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

State of the art

Image description generation and evaluation for the state of the art approaches. [1]

Page 44: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

State of the art

Examples of images with two semantically similar descriptions. [1]

Page 45: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

Background● Conditional GAN

○ Conditioned on some extra information y.

Conditional adversarial net. [3]

Page 46: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

Background● Sequential Sampling

○ Non-probability sampling technique■ Not equal chances of being selected for all members of the population

○ Picking a single or a group of objects in every time interval○ Analyze the result, pick another sample

● Monte Carlo tree search○ A heuristic search algorithm for some kinds of decision processes esp. Game plays○ Steps

■ Selection■ Expansion■ Simulation■ Backpropagation

Page 47: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

Proposed Approach

Overall structure of both generator and evaluator of a CGAN. [1]

Page 48: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

Overall formulation● Evaluator

○ Quality of descriptions

● Learning objective

Page 49: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

The production of sentences is non-differentiable. How can

we backpropagate?

Page 50: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

Challenges● Using policy gradient for generating linguistic description

○ Sequential sampling procedure■ Sampling a discrete token at each time step

○ Non-differentiable● Expected future reward for early feedback using Monte Carlo rollouts

○ Vanishing gradients○ Error propagation

Page 51: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

Training G● Policy gradient

○ Action space: words○ Conditional policy

○ Reward given by the evaluator for a sequence of actions S

Page 52: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

Training G● Early feedback

○ Expected future reward

○ Gradient of the objective w.r.t.

Page 53: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

Training E● Enforcing naturalness and semantic relevance

○ Set of descriptions provided by human○ Descriptions from the generator○ Human descriptions uniformly sampled from other descriptions not associated with the given

image

Page 54: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

Generating paragraphs● Hierarchical LSTM

○ Sentence level○ Word level

Adding extension for paragraph generation. [1]

Page 55: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

Experiment● Datasets

○ MSCOCO○ Flickr30k

● Settings○ Removing non-alphabet characters○ Converting to lowercase○ Replacing less frequent words, less than 5 times, with UNK○ Max length of 16○ Pretrain G for 20 epochs based on MLE○ Pretrain E with supervised training for 5 epochs○ Batch = 64, lr = 0.0001, n = 16 (Monte Carlo)

Page 56: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

Experiment● Performance

Performances of different generators on MSCOCO and Flickr30k. [1]

Page 57: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

Has the proposed approach been able to achieve more

natural descriptions?

Page 58: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

Experiment● User Study

Human comparison results between each pair of generators. [1]

Page 59: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

Experiment● User Study

Corresponding G-GAN captions for images with similar descriptions in G-MLE. [1]

Page 60: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

Experiment● Diverse descriptions

Generated descriptions with different z. [1]

Page 61: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

Experiment● Evaluating semantic relevance by retrieval

Image rankings for different generators. [1]

S → E-GANP → Log-likelihood

Page 62: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

Experiment● Paragraph generation with different z values

Different paragraph descriptions generated by human, G-GAN, and G-MLE with different z values. [1]

Page 63: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

Failure Analysis● Incorrect details

○ Colors○ Counts

■ Few samples for each special detail■ Increased risk of putting more incorrect details due to focus on diversity

Page 64: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

Potential extensions?

Page 65: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

Future works● Use VAE instead of GAN● Use other similarity metrics instead of dot product in evaluator

Page 66: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

Presenters● Kevin Dsouza● Ainaz Hajimoradlou

Page 67: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

References1. B. Dai, S. Fidler, R. Urtasun, and D. Lin, Towards Diverse and Natural Image

Descriptions via a Conditional GAN, 2017, arXiv preprint arXiv: 1703.060292. O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, Show and Tell: Lessons

learned from the 2015 MSCOCO Image Captioning Challenge, arXiv preprint arXiv: 1609.06647

3. M. Mirza, S. Osindero, Conditional Generative Adversarial Nets, arXiv preprint arXiv: 1411.1784

Page 68: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et

Thank you.