deep learning for natural language processing the

17

Deep Learning for Natural Language Processing The Transformer model Richard Johansson [email protected]

Upload: others

Post on 05-Apr-2022

5 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: Deep Learning for Natural Language Processing The

Deep Learning for Natural LanguageProcessing

The Transformer model

Richard Johansson

[email protected]

Page 2: Deep Learning for Natural Language Processing The

-20pt

drawbacks of recurrent models

I even with GRUs and LSTMs, it is difficult for RNNs topreserve information over long distances

I we introduced attention as a way to deal with this problemI can we skip the RNN and just use attention?

Page 3: Deep Learning for Natural Language Processing The

-20pt

attention models: recapI first, compute an “energy” ei for each state hi

I for the attention weights, we apply the softmax:

αi =exp ei∑nj=1 exp ej

I finally, the “summary” is computed as a weighted sum

s =n∑

i=1

αihi

Page 4: Deep Learning for Natural Language Processing The

-20pt

the Transformer

I the Transformer (Vaswani et al., 2017) is anarchitecture that uses attention for information flow:“Attention is all you need”

I it was originally designed for machine translationand has two parts:I an encoder that “summarizes” an input sentenceI a decoder (a conditional LM) that generates an

output, based on the input

I let’s consider the encoder

Page 5: Deep Learning for Natural Language Processing The

-20pt

illustration of a Transformer block

Page 6: Deep Learning for Natural Language Processing The

-20pt

illustration of a Transformer block

Page 7: Deep Learning for Natural Language Processing The

-20pt

illustration of a Transformer block

Page 8: Deep Learning for Natural Language Processing The

-20pt

illustration of a Transformer block

Page 9: Deep Learning for Natural Language Processing The

-20pt

illustration of a Transformer block

Page 10: Deep Learning for Natural Language Processing The

-20pt

illustration of a Transformer block

Page 11: Deep Learning for Natural Language Processing The

-20pt

multi-head attention

I in each layer, the Transformer applies severalattention models (“heads”) in parallel

I intuitively, the heads are “looking” for differenttypes of information

I each attention head computes a scaled dotproduct attention:

ei =1√dqi · kj

α = softmax(e)

where qi and kj are linear transformations of theinput at positions i and j

Page 12: Deep Learning for Natural Language Processing The

-20pt

a layer in the Transformer encoder

I after each application of multi-head attention, a2-layer feedforward model (with ReLU activation) isapplied

I residual connections (“shortcuts”) and layernormalization (Ba et al., 2016) added for robustnessand to facilitate training

I the Transformer encoder consists of a stack of thistype of block

Page 13: Deep Learning for Natural Language Processing The

-20pt

what do the attention heads look at?

I see (Vig, 2019)

Page 14: Deep Learning for Natural Language Processing The

-20pt

pros and cons

+ short path length for information flow– quadratic complexity

Page 15: Deep Learning for Natural Language Processing The

-20pt

the road ahead

I the full Transformer is an effective model formachine translation

I we’ll return to it when we discussencoder–decoder architectures

I for now, let’s use it simply as a pre-trainedrepresentation

Page 16: Deep Learning for Natural Language Processing The

-20pt

reading

The Illustrated Transformer

http://jalammar.github.io/illustrated-transformer/

Page 17: Deep Learning for Natural Language Processing The

-20pt

references

J. Ba, J. Kiros, and G. Hinton. 2016. Layer normalization. arXiv:1607.06450.

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez,Ł. Kaiser, and I. Polosukhin. 2017. Attention is all you need. In NIPS 30 .

J. Vig. 2019. Visualizing attention in transformer-based languagerepresentation models. arXiv:1904.02679.

https://arxiv.org/pdf/1607.06450.pdf

http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf

https://arxiv.org/pdf/1904.02679

https://arxiv.org/pdf/1904.02679

For Natural Language Processing Deep Learningslazebni.cs.illinois.edu/spring17/lec24_nlp.pdf · Deep Learning For Natural Language Processing Presented By: Quan Wan, Ellen Wu, Dongming

Deep Learning for Natural Language Processing Introduction

Clinical Natural Language Processing with Deep Learning · Clinical Natural Language Processing with Deep Learning 3 senting, learning, and using linguistic, situational, world or

CS224D: Deep Learning for Natural Language Processing · 2016. 5. 19. · Andrew’Maas.’Stanford’CS224D.’2016’ CS224D: Deep Learning for Natural Language Processing Andrew’Maas’

CS224d Deep Learning for Natural Language Processing ...cs224d.stanford.edu/lectures/CS224d-Lecture2.pdf · CS224d Deep Learning for Natural Language Processing Lecture 2: Word Vectors

Natural Language Processing with Deep Learning CS224N/Ling284

For Natural Language Processing Deep Learningweb.engr.illinois.edu/~slazebni/spring17/lec24_nlp.pdfIntroduction to Natural Language Processing Word Representation Language Model Question

Natural Language Processing with PyTorch: Build Intelligent Language Applications Using Deep Learning

Natural Language Processing with Deep Learning CS224N/Ling284leeck/NLP2/03_word_embedding... · 2017-09-18 · Natural Language Processing with Deep Learning CS224N/Ling284 Christopher

Research on Deep Learning for Natural Language Processing at

Natural Language Processing with Deep Learning CS224N/Ling284 · Natural Language Processing with Deep Learning CS224N/Ling284 Christopher Manning and Richard Socher ... There is

Deep learning for natural language processing A short ... · Deep learning for Natural Language Processing Day 1 Class: intro to natural language processing Class: quick primer on

Natural Language Processing with Deep Learning CS224n

Deep Learning for Natural Language Processing Pre-trained

Deep learning for natural language processing Introduction ... · Deep learning for natural language processing Introduction to natural language processing Benoit Favre

CS224n: Natural Language Processing with Deep Learning …web.stanford.edu/class/cs224n/readings/cs224n-2019-notes... · 2020-01-02 · cs224n: natural language processing with deep

Recent Trends in Deep Learning Based Natural Language Processing … · Recent Trends in Deep Learning Based Natural Language Processing Tom Younga y, Devamanyu Hazarikab, Soujanya

Recursive Deep Learning for Natural Language Processing

CS224n: Natural Language Processing with Deep Learning ... · PDF fileCS224n: Natural Language Processing with Deep Learning 1 1 Course Instructors: Christopher Lecture Notes: Part

CS224D: Deep Learning for Natural Language Processing

CS224n: Natural Language Processing with Deep …web.stanford.edu/class/cs224n/readings/cs224n-2019-notes...cs224n: natural language processing with deep learning lecture notes: part

Natural Language Processing with Deep Learning CS224N/Ling284web.stanford.edu/class/cs224n/slides/cs224n-2020-lecture02-wordve… · Natural Language Processing with Deep Learning

Deep Learning for Natural Language Processing

What is Deep Learning, Natural Language Processing and

Deep Learning for Natural Language Processing - …docs.huihoo.com/.../Deep-Learning-for-Natural-Language-Processing... · Xipeng Qiu (Fudan University) Deep Learning for Natural

Deep Learning for Natural Language Processing: Word Embeddings

Deep Learning for Natural Language Processing (NLP) using … · 2020. 5. 29. · Deep Learning for Natural Language Processing (NLP) using Variational Autoencoders (VAE) Master`s

Recent Trends in Deep Learning Based Natural Language … · Index Terms—Natural Language Processing, Deep Learning I. INTRODUCTION Natural language processing (NLP) is a theory-motivated

CS224n: Natural Language Processing with Deep Learning ...web.stanford.edu/class/cs224n/readings/cs224n-2019-notes02-wordvecs2.pdfcs224n: natural language processing with deep learning

Deep Learning in Natural Language Processing

Natural Language Processing with Deep Learning …

APPLICATION OF NATURAL LANGUAGE PROCESSING AND DEEP ...eurofiling.info/2018/wp-content/uploads/2018-05-29-eurofiling2018_… · APPLICATION OF NATURAL LANGUAGE PROCESSING AND DEEP

Deep Learning Applications in Natural Language Processingufal.mff.cuni.cz/~zabokrtsky/fel/slides/lect14-deep-learning... · Deep Learning Applications in Natural Language Processing

Natural Language Processing with Deep Learningleeck/Graph_theory/9-2_Fancy...Natural Language Processing with Deep Learning CS224N/Ling284 Lecture 9: Recap and Fancy Recurrent Neural

Deep Learning for Natural Language Processing · Deep Learning for Natural Language Processing Stephen Clark University of Cambridge and DeepMind. 1. Introduction to Neural ... •