![Page 1: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!](https://reader034.vdocuments.mx/reader034/viewer/2022051804/5fef94c6c42b1e6c0b0f1e02/html5/thumbnails/1.jpg)
Neural Networks in NLP:The Curse of Indifferentiability
Lili [email protected]://sei.pku.edu.cn/~moull12
![Page 2: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!](https://reader034.vdocuments.mx/reader034/viewer/2022051804/5fef94c6c42b1e6c0b0f1e02/html5/thumbnails/2.jpg)
Outline
● Preliminary● Indifferentiability, solutions, and applications
– The curse of indifferentiability– Solutions: Attention, reinforcement learning, etc.– Applications: Sequencelevel objective, SeqGAN, etc.
● A case study in semantic parsing
![Page 3: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!](https://reader034.vdocuments.mx/reader034/viewer/2022051804/5fef94c6c42b1e6c0b0f1e02/html5/thumbnails/3.jpg)
The Curse of Indifferentiability
● Characters are discrete!● Words are discrete!● Phrases are discrete!● Sentences are discrete!● Paragraphs are discrete!● All symbols are discrete!
● Word embeddings are continuous but are nothing!
![Page 4: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!](https://reader034.vdocuments.mx/reader034/viewer/2022051804/5fef94c6c42b1e6c0b0f1e02/html5/thumbnails/4.jpg)
Indifferentiability
CE CE CE CE CE CE
![Page 5: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!](https://reader034.vdocuments.mx/reader034/viewer/2022051804/5fef94c6c42b1e6c0b0f1e02/html5/thumbnails/5.jpg)
Indifferentiability
CE CE CE CE CE CE
Risk (e.g., BLEU)
![Page 6: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!](https://reader034.vdocuments.mx/reader034/viewer/2022051804/5fef94c6c42b1e6c0b0f1e02/html5/thumbnails/6.jpg)
Indifferentiability
CE CE CE CE CE CE
Risk (e.g., BLEU)
![Page 7: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!](https://reader034.vdocuments.mx/reader034/viewer/2022051804/5fef94c6c42b1e6c0b0f1e02/html5/thumbnails/7.jpg)
Indifferentiability
● Input: word embeddings �● Output: argmax p(word) �● Risk: a function of output �
![Page 8: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!](https://reader034.vdocuments.mx/reader034/viewer/2022051804/5fef94c6c42b1e6c0b0f1e02/html5/thumbnails/8.jpg)
Outline
● Preliminary● Indifferentiability, solutions, and applications
– The curse of indifferentiability
– Solutions: Attention, reinforcement learning, etc.– Applications: Sequencelevel objective, SeqGAN, etc.
● A case study in semantic parsing
![Page 9: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!](https://reader034.vdocuments.mx/reader034/viewer/2022051804/5fef94c6c42b1e6c0b0f1e02/html5/thumbnails/9.jpg)
Solution: Attempt #1
Classification of a particular word
=> Regression of word embeddings
![Page 10: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!](https://reader034.vdocuments.mx/reader034/viewer/2022051804/5fef94c6c42b1e6c0b0f1e02/html5/thumbnails/10.jpg)
Solution: Attempt #1
Classification of a particular word
=> Regression of word embeddings
● Total failure (but why?)
�
![Page 11: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!](https://reader034.vdocuments.mx/reader034/viewer/2022051804/5fef94c6c42b1e6c0b0f1e02/html5/thumbnails/11.jpg)
Solution: Attempt #2
● Attention (weighted sum)
![Page 12: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!](https://reader034.vdocuments.mx/reader034/viewer/2022051804/5fef94c6c42b1e6c0b0f1e02/html5/thumbnails/12.jpg)
Solution: Attempt #3
● Reinforcement learning (Trialanderror)– Sample an action (sequence)– See what the reward is
![Page 13: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!](https://reader034.vdocuments.mx/reader034/viewer/2022051804/5fef94c6c42b1e6c0b0f1e02/html5/thumbnails/13.jpg)
∂
REINFORCE
● Define an external cost function on a generated sequence● Generate words by sampling● Take the derivative of generated samples
● J = [ p(w|...)] r(w) = p(w)[ log p(w)] r(w)∂ ∑w
∂∑w
Ranzato, Marc'Aurelio, et al. "Sequence Level Training with Recurrent Neural Networks." ICLR, 2016.
![Page 14: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!](https://reader034.vdocuments.mx/reader034/viewer/2022051804/5fef94c6c42b1e6c0b0f1e02/html5/thumbnails/14.jpg)
Caveats
● REINFORCE may be extremely difficult to train– Hard to get started– Poor local optima– Sensitive to hyperparameters
● Supervised pretraining
![Page 15: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!](https://reader034.vdocuments.mx/reader034/viewer/2022051804/5fef94c6c42b1e6c0b0f1e02/html5/thumbnails/15.jpg)
Solution: Attempt #4
● Gumble softmax– Sample from a class distribution
where
– Softmax approximation
Jang, Eric, Shixiang Gu, and Ben Poole. "Categorical Reparameterization with GumbelSoftmax." ICLR, 2017.
![Page 16: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!](https://reader034.vdocuments.mx/reader034/viewer/2022051804/5fef94c6c42b1e6c0b0f1e02/html5/thumbnails/16.jpg)
Solution: Attempt #4
● Interpolation between onehot and uniform (with class distribution information)
![Page 17: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!](https://reader034.vdocuments.mx/reader034/viewer/2022051804/5fef94c6c42b1e6c0b0f1e02/html5/thumbnails/17.jpg)
Outline
● Preliminary● Indifferentiability, solutions, and applications
– The curse of indifferentiability– Solutions: Attention, reinforcement learning, etc.
– Applications: Sequencelevel obj., SeqGAN, etc.
● A case study in semantic parsing
![Page 18: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!](https://reader034.vdocuments.mx/reader034/viewer/2022051804/5fef94c6c42b1e6c0b0f1e02/html5/thumbnails/18.jpg)
Application: SequenceLevel Objective
● REINFORCE towards BLEU● Annealing
– For 1..T words– Supervised training: 1..t– RL: t+1..T
![Page 19: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!](https://reader034.vdocuments.mx/reader034/viewer/2022051804/5fef94c6c42b1e6c0b0f1e02/html5/thumbnails/19.jpg)
Results
Shen, Shiqi, Yong Cheng, Zhongjun He, Wei He, Hua Wu, Maosong Sun, and Yang Liu. "Minimum risk training for neural machine translation." ACL, 2016.
![Page 20: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!](https://reader034.vdocuments.mx/reader034/viewer/2022051804/5fef94c6c42b1e6c0b0f1e02/html5/thumbnails/20.jpg)
Application: SeqGAN
Yu, Lantao, Weinan Zhang, Jun Wang, and Yong Yu. "Seqgan: sequence generative adversarial nets with policy gradient." In AAAI. 2017.
![Page 21: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!](https://reader034.vdocuments.mx/reader034/viewer/2022051804/5fef94c6c42b1e6c0b0f1e02/html5/thumbnails/21.jpg)
Generative Adversarial Network
● Two agents: – Generative model: Generate new samples that are as similar
as the data
– Discriminative model: Distinguish samples in disguise
● Each agent takes a step in turn
![Page 22: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!](https://reader034.vdocuments.mx/reader034/viewer/2022051804/5fef94c6c42b1e6c0b0f1e02/html5/thumbnails/22.jpg)
Objective of GAN
● G(z): A generated sample from distribution z● D(x) = Estimated (by D) prob. that x is a real data sample
– D(x)=1: D regards x as a training sample w.p.1
– D(x)=0: D regards x as a generative sample w.p.1
V(D,G)
![Page 23: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!](https://reader034.vdocuments.mx/reader034/viewer/2022051804/5fef94c6c42b1e6c0b0f1e02/html5/thumbnails/23.jpg)
Objective of GAN
V(D,G)Algorithm
max VD
![Page 24: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!](https://reader034.vdocuments.mx/reader034/viewer/2022051804/5fef94c6c42b1e6c0b0f1e02/html5/thumbnails/24.jpg)
Curse of Indifferentiability
V(D,G)Algorithm
max VD
![Page 25: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!](https://reader034.vdocuments.mx/reader034/viewer/2022051804/5fef94c6c42b1e6c0b0f1e02/html5/thumbnails/25.jpg)
Solution
● REINFORCE!
● Does SeqGAN provide a more powerful density estimator?
![Page 26: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!](https://reader034.vdocuments.mx/reader034/viewer/2022051804/5fef94c6c42b1e6c0b0f1e02/html5/thumbnails/26.jpg)
Application: Rationale neural predictions
Lei, Tao, Regina Barzilay, and Tommi Jaakkola. "Rationalizing neural predictions." EMNLP, 2016.
![Page 27: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!](https://reader034.vdocuments.mx/reader034/viewer/2022051804/5fef94c6c42b1e6c0b0f1e02/html5/thumbnails/27.jpg)
Objective
●
●
●
![Page 28: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!](https://reader034.vdocuments.mx/reader034/viewer/2022051804/5fef94c6c42b1e6c0b0f1e02/html5/thumbnails/28.jpg)
Training
● REINFORCE!
![Page 29: Neural Networks in NLP: The Curse of Indifferentiability · The Curse of Indifferentiability Characters are discrete! Words are discrete! Phrases are discrete! Sentences are discrete!](https://reader034.vdocuments.mx/reader034/viewer/2022051804/5fef94c6c42b1e6c0b0f1e02/html5/thumbnails/29.jpg)
Results
Red: appearanceBlue: SmellGreen: Palate