bbm495-lecture8burcucan/bbm495-lecture8-4pp.pdf · exp(uwvc) dot product compares similarity of o...

15
4/29/19 1 BBM 495 WORD EMBEDDINGS 2018-2019 SPRING § How similar is pizza to pasta? § How related is pizza to Italy? § Representing words as vectors allows easy computation of similarity

Upload: others

Post on 27-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: BBM495-Lecture8burcucan/BBM495-Lecture8-4pp.pdf · exp(uwvc) Dot product compares similarity of o and c. Larger dot product = larger probability After taking exponent, normalize over

4/29/19

1

BBM 495WORD EMBEDDINGS

2018-2019 SPRING

§ How similar is pizza to pasta?§ How related is pizza to Italy?

§ Representing words as vectors allows easy computation of similarity

Page 2: BBM495-Lecture8burcucan/BBM495-Lecture8-4pp.pdf · exp(uwvc) Dot product compares similarity of o and c. Larger dot product = larger probability After taking exponent, normalize over

4/29/19

2

§ Increase in size with vocabulary§ Very high dimensional: require a lot of storage§ Subsequent classification models have sparsity issues§ Models are less robust

Page 3: BBM495-Lecture8burcucan/BBM495-Lecture8-4pp.pdf · exp(uwvc) Dot product compares similarity of o and c. Larger dot product = larger probability After taking exponent, normalize over

4/29/19

3

Page 4: BBM495-Lecture8burcucan/BBM495-Lecture8-4pp.pdf · exp(uwvc) Dot product compares similarity of o and c. Larger dot product = larger probability After taking exponent, normalize over

4/29/19

4

Page 5: BBM495-Lecture8burcucan/BBM495-Lecture8-4pp.pdf · exp(uwvc) Dot product compares similarity of o and c. Larger dot product = larger probability After taking exponent, normalize over

4/29/19

5

Page 6: BBM495-Lecture8burcucan/BBM495-Lecture8-4pp.pdf · exp(uwvc) Dot product compares similarity of o and c. Larger dot product = larger probability After taking exponent, normalize over

4/29/19

6

Page 7: BBM495-Lecture8burcucan/BBM495-Lecture8-4pp.pdf · exp(uwvc) Dot product compares similarity of o and c. Larger dot product = larger probability After taking exponent, normalize over

4/29/19

7

Predict!

Page 8: BBM495-Lecture8burcucan/BBM495-Lecture8-4pp.pdf · exp(uwvc) Dot product compares similarity of o and c. Larger dot product = larger probability After taking exponent, normalize over

4/29/19

8

§ Remember: two vectors are similar if they have a high dot product§ Cosine is just a normalized dot product

§ So: § Similarity (o,c) ∝ uo · Vc

§ Wewill need to normalize to get a probability

§ We use softmax to turn into probabilities:

Page 9: BBM495-Lecture8burcucan/BBM495-Lecture8-4pp.pdf · exp(uwvc) Dot product compares similarity of o and c. Larger dot product = larger probability After taking exponent, normalize over

4/29/19

9

Page 10: BBM495-Lecture8burcucan/BBM495-Lecture8-4pp.pdf · exp(uwvc) Dot product compares similarity of o and c. Larger dot product = larger probability After taking exponent, normalize over

4/29/19

10

Page 11: BBM495-Lecture8burcucan/BBM495-Lecture8-4pp.pdf · exp(uwvc) Dot product compares similarity of o and c. Larger dot product = larger probability After taking exponent, normalize over

4/29/19

11

all§ Take gradients at each window

§ Go through gradient for each center vector v in a window§ In each window, we will compute updates for all parameters that

are being used in that window. For example:

Page 12: BBM495-Lecture8burcucan/BBM495-Lecture8-4pp.pdf · exp(uwvc) Dot product compares similarity of o and c. Larger dot product = larger probability After taking exponent, normalize over

4/29/19

12

Page 13: BBM495-Lecture8burcucan/BBM495-Lecture8-4pp.pdf · exp(uwvc) Dot product compares similarity of o and c. Larger dot product = larger probability After taking exponent, normalize over

4/29/19

13

Page 14: BBM495-Lecture8burcucan/BBM495-Lecture8-4pp.pdf · exp(uwvc) Dot product compares similarity of o and c. Larger dot product = larger probability After taking exponent, normalize over

4/29/19

14

Page 15: BBM495-Lecture8burcucan/BBM495-Lecture8-4pp.pdf · exp(uwvc) Dot product compares similarity of o and c. Larger dot product = larger probability After taking exponent, normalize over

4/29/19

15

§ Dense Vectors, Dan Jurafsky§ Representation for Language: from Word Embeddings to

Sentence Meanings, Christopher Manning, Stanford University, 2017

§ Natural Language Processing with Deep Learning, Richard Socher, Stanford University

§ More Word Vectors, Richard Socher, Stanford University§ Improving Distributional Similarity with Lessons Learned from

Word Embeddings, Omer Levy,