assignment 1 - department of computer science, university...

Assignment 1Learning distributed word representations

Jimmy Ba csc321ta@cs.toronto.edu

Background

• Text and language play central role in a wide range of computer science and engineering problems

• Applications that depend on language understanding/processing includes: speech processing, search/query internet, social media, recommendation system, artificial intelligence and many others

Motivation

• Getting meaningful representations from text data are often the key component in Google search engine or your next big start-up ideas

ZacEfronandhis

parents

some processed

representation

Classify: Is NASDAQ index going up or down?

Motivation

• Getting meaningful representations from text data are often the key component in Google search engine or your next big start-up ideas

ZacEfronandhis

parents

some processed

representation

Classify: Is NASDAQ index going up or down?

language model

Language Model

• We need to represent text data in a way that is “easy” for the later stage classification problem or learning algorithms

• “Easy”: Be able to handle large scale vocabulary and words have similar syntactic/semantic meaning should be close in the representation space

Language Model

1 0 0 0 0 0 0 0

0 1 0 0 0 0 0 0

0 0 1 0 0 0 0 0

0 0 0 1 0 0 0 0

one-of-K encoding

binary encoding

0 0 0 0 0 0 0 1 1 1 1

“Zac”

“Efron”

“and”

“his”

“parents”

Language Model

1 0 0 0 0 0 0 0

0 1 0 0 0 0 0 0

0 0 1 0 0 0 0 0

0 0 0 1 0 0 0 0

one-of-K encoding

binary encoding

0 0 0 0 0 0 0 1 1 1 1

“Zac”

“Efron”

“and”

“his”

“parents”vocabulary size log(vocabulary size)

Language Model

1 0 0 0 0 0 0 0

0 1 0 0 0 0 0 0

0 0 1 0 0 0 0 0

0 0 0 1 0 0 0 0

one-of-K encoding

1.5 0.1 -0.1 2.1

distributed encoding

0 0 0 0 0 0 0 1

“Zac”

“Efron”

“and”

“his”

“parents”

0.7 -0.1 0.3 0.4

0.1 1.6 -1.9 1.1

3.5 0.2 1.1 -2.5

-2.1 -3.3 -2.7 1.9

vocabulary size embedding size (constant)

Neural Language Model

word 4

Hidden Layer 128

word reps

word index 250

word 1

word 2250

word 3

“Zac” “Efron” “is”

“hot”

Neural Language Model

word 4

Hidden Layer 128

word reps

word index 250

word 1

word 2250

word 3

yzoutput

softmax

hzhidden

sigmoid

cross entropy cost: C(W)

Neural Language Modelactivations.output_layer

activations.hidden_layer

activations.embedding_layer

Things you need to do in assignment 1

• Part 2

word 4

Hidden Layer 128

word reps

word index 250

word 1

word 2250

word 3

yzoutput

softmax

hzhidden

sigmoid

cross entropy cost: C(W) @C

@zoutput

• Part 2

• code/derive the partial derivative of cross-entropy cost with respect to softmax input

• code/derive the gradients of the weight matrix using partial derivatives from backdrop

• can be done in just 5 simple lines of code and NO for-loops

• use checking.check_gradients to verify the correctness of the 5 lines of code

• Part 3

• analyze the trained model

t-SNE embedding• It projects 16D learnt word embedding to 2D for

plotting visualization only. (display_nearest_words, word_distance uses the 16D word embedding)

Word Distance

• It only makes sense to compare the relative distances between words, i.e.

• distance(A, B) and distance(A, C)

• distance(A, B) and distance(A, w), distance(B,w)

• NOT distance(A,B) and distance(C,D)

• Part 3

• Think about how the model would put two words close together in embedding space

• Think about what the task the model is trying to achieve and how that affects the word representation that is being learned.

• Think about what kind of similarity the nearest words in the 16D embedding space have

Due: Tuesday, Feb. 3

at the start of lecture

assignment 1 - department of computer science, university...

Documents

csc321: database management systems

csc321: 2011 introduction to neural networks and machine...

csc321 lecture 11: convolutional networks › ~rgrosse ›...

csc321 tutorial 6: part 1: recurrent neural network part 2

csc321 lecture 20:...

csc321 lecture 2 a simple learning algorithm : linear...

csc321 lecture 16: learning long-term dependencies ›...

csc321 lecture 17: resnets and...

csc321 concurrent programming: §4 semaphores 1 section 4...

csc321 lecture 7 neural language...

csc321: programming languages5-1 csc321: programming...

csc321 tutorial 5 part b: assignment 2: neural networks...

csc 321 assignment 2 convolutional...

csc321 concurrent programming: §5 monitors 1 section 5...

csc321 lecture 13/14 improving...

csc321 - python tutorialrgrosse/csc321/tutorial1.pdf ·...

csc321: tutorial with extra material on: semantic hashing...

csc321 lecture 9 recurrent neural nets -...

csc321 lecture 18: learning probabilistic...

csc321 tutorial 3: assignment 1 - exploring a simple