neural networks in nlp: the curse of indifferentiability · the curse of indifferentiability...

29
 Neural Networks in NLP: The Curse of Indifferentiability Lili Mou [email protected] http://sei.pku.edu.cn/~moull12

Upload: others

Post on 12-Sep-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!

   

Neural Networks in NLP:The Curse of Indifferentiability

Lili [email protected]://sei.pku.edu.cn/~moull12

Page 2: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!

   

Outline

● Preliminary● Indifferentiability, solutions, and applications

– The curse of indifferentiability– Solutions: Attention, reinforcement learning, etc.– Applications: Sequence­level objective, SeqGAN, etc.

● A case study in semantic parsing

Page 3: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!

   

The Curse of Indifferentiability

● Characters are discrete!● Words are discrete!● Phrases are discrete!● Sentences are discrete!● Paragraphs are discrete!● All symbols are discrete!

● Word embeddings are continuous but are nothing!

Page 4: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!

   

Indifferentiability

CE CE CE CE CE CE

Page 5: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!

   

Indifferentiability

CE CE CE CE CE CE

Risk (e.g., BLEU)

Page 6: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!

   

Indifferentiability

CE CE CE CE CE CE

Risk (e.g., BLEU)

Page 7: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!

   

Indifferentiability

● Input: word embeddings �● Output: argmax p(word) �● Risk: a function of output �

Page 8: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!

   

Outline

● Preliminary● Indifferentiability, solutions, and applications

– The curse of indifferentiability

– Solutions: Attention, reinforcement learning, etc.– Applications: Sequence­level objective, SeqGAN, etc.

● A case study in semantic parsing

Page 9: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!

   

Solution: Attempt #1

Classification of a particular word

=> Regression of word embeddings

Page 10: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!

   

Solution: Attempt #1

Classification of a particular word

=> Regression of word embeddings

● Total failure (but why?) 

Page 11: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!

   

Solution: Attempt #2

● Attention (weighted sum)

Page 12: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!

   

Solution: Attempt #3

● Reinforcement learning (Trial­and­error)– Sample an action (sequence)– See what the reward is

Page 13: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!

REINFORCE

● Define an external cost function on a generated sequence● Generate words by sampling● Take the derivative of generated samples

●     J =     [    p(w|...)] r(w) =     p(w)[   log p(w)] r(w)∂ ∑w

∂∑w

Ranzato, Marc'Aurelio, et al. "Sequence Level Training with Recurrent Neural Networks." ICLR, 2016.

Page 14: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!

Caveats

● REINFORCE may be extremely difficult to train– Hard to get started– Poor local optima– Sensitive to hyperparameters

● Supervised pretraining

Page 15: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!

Solution: Attempt #4 

● Gumble softmax– Sample from a class distribution

where

– Softmax approximation

Jang, Eric, Shixiang Gu, and Ben Poole. "Categorical Reparameterization with Gumbel­Softmax." ICLR, 2017.

Page 16: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!

Solution: Attempt #4 

● Interpolation between onehot and uniform (with class distribution information)

Page 17: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!

   

Outline

● Preliminary● Indifferentiability, solutions, and applications

– The curse of indifferentiability– Solutions: Attention, reinforcement learning, etc.

– Applications: Sequence­level obj., SeqGAN, etc.

● A case study in semantic parsing

Page 18: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!

   

Application: Sequence­Level Objective

● REINFORCE towards BLEU● Annealing

– For 1..T words– Supervised training: 1..t– RL: t+1..T

Page 19: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!

   

Results

Shen, Shiqi, Yong Cheng, Zhongjun He, Wei He, Hua Wu, Maosong Sun, and Yang Liu. "Minimum risk training for neural machine translation." ACL, 2016.

Page 20: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!

   

Application: SeqGAN

Yu, Lantao, Weinan Zhang, Jun Wang, and Yong Yu. "Seqgan: sequence generative adversarial nets with policy gradient." In AAAI. 2017.

Page 21: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!

Generative Adversarial Network

● Two agents: – Generative model: Generate new samples that are as similar 

as the data

– Discriminative model: Distinguish samples in disguise

● Each agent takes a step in turn

 

Page 22: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!

Objective of GAN

● G(z): A generated sample from distribution z● D(x) = Estimated (by D) prob. that x is a real data sample

– D(x)=1: D regards x as a training sample w.p.1

– D(x)=0: D regards x as a generative sample w.p.1

V(D,G)

Page 23: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!

Objective of GAN

V(D,G)Algorithm

max VD

Page 24: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!

Curse of Indifferentiability

V(D,G)Algorithm

max VD

Page 25: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!

Solution

● REINFORCE!

● Does SeqGAN provide a more powerful density estimator?

Page 26: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!

   

Application: Rationale neural predictions

Lei, Tao, Regina Barzilay, and Tommi Jaakkola. "Rationalizing neural predictions." EMNLP, 2016.

Page 27: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!

   

Objective

Page 28: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!

   

Training

● REINFORCE!

Page 29: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!

   

Results

Red: appearanceBlue: SmellGreen: Palate