tensorflow 深度學習快速上手班--自然語言處理應用

Post on 16-Apr-2017

1.185 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

TensorFlow深度學習快速上⼿手班������

四、⾃自然語⾔言處理應⽤用

By Mark Chang

•  ⾃自然語⾔言處理簡介 •  Word2vec神經網路 •  語意運算實作

⾃自然語⾔言處理簡介

⾃自然語⾔言處理 •  ⾃自然語⾔言處理是⼈人⼯工智慧和語⾔言學領域的分⽀支

– 探討如何處理及運⽤用⾃自然語⾔言 •  ⾃自然語⾔言理解系統

– 把⾃自然語⾔言轉化為電腦易於處理的形式。 •  ⾃自然語⾔言⽣生成系統

– 把電腦程式數據轉化為⾃自然語⾔言。 •  https://zh.wikipedia.org/wiki/%E8%87%AA

%E7%84%B6%E8%AF%AD%E8%A8%80%E5%A4%84%E7%90%86���

語意理解

https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf

機器翻譯

http://arxiv.org/abs/1409.0473

詩詞創作

http://emnlp2014.org/papers/pdf/EMNLP2014074.pdf

影像標題產⽣生

http://arxiv.org/pdf/1411.4555v2.pdf

影像內容問答

http://arxiv.org/pdf/1505.00468v6.pdf

Word2vec神經網路

⽂文字的語意

•  某個字的語意,可從它的上下⽂文得知

dog 和 cat 語意相近.

The dog run. A cat run. A dog sleep. The cat sleep. A dog bark. The cat meows.

語意向量

The dog run. A cat run. A dog sleep. The cat sleep. A dog bark. The cat meows.

the a run sleep bark meow dog 1 2 2 2 1 0

cat 2 1 2 2 0 1

語意向量

dog (1, 2,..., xn)

cat (2, 1,..., xn)

Car (0, 0,..., xn)

語意向量相似度 •  A 和 B 的Cosine Similarity 為: A ·B

|A||B|

dog (a1, a2, ..., an)

cat (b1, b2, ..., bn)

dog 和 cat 的cosine similarity為:

a1b1 + a2b2 + ...+ anbnpa21 + a22 + ...+ a2n

pb21 + b22 + ...+ b2n

語意向量加減運算

Woman + King - Man = Queen

Woman Queen

Man King

King - Man

King - Man

語意向量維度太⼤大

(x1=the, x2 =a,..., xn)

dog

語意向量的維度等於總字彙量

x1

x2

x3

x4

xn ...

Word2vec神經網路

dog

One-Hot Encoding

word2vec 神經網路

壓縮過的語意向量

1.2

0.7

0.5

1

0

0

0

One-Hot Encoding

dog cat run fly 1

Initialize Weights

dog

cat run

fly

dog

cat run

fly

W =

2

664

w11 w12 w13

w21 w22 w23

w31 w32 w33

w31 w32 w43

3

775V =

2

664

v11 v12 v13v21 v22 v23v31 v32 v33v31 v32 v43

3

775

把語意向量壓縮

dog

高維度

低維度

v11

v12

v13

v11

v12

v13

v11

v12

v13

Compressed Vectors

dog cat run fly

v11

v12

v13

v21

v22

v23

w31

w32

w33

w41

w42

w43

dog

cat run

fly

dog

cat run

fly

Context Word dog 1

v11

v12

v13

v11

v12

v13 run

w31

w32

w33

dog

cat run

fly dog cat run fly

1

1 + e�V1W3⇡ 1

V1 ·W3 = v11w31 + v12w32 + v13w33

Context Word cat

v11

v12

v13

v21

v22

v23 run

w31

w32

w33

dog cat run fly

V2 ·W3 = v21w31 + v22w32 + v23w33

dog cat run fly

1

1 + e�V2W3⇡ 1

Non-context Word dog 1

v11

v12

v13

v11

v12

v13

fly

w41

w42

w43

V1 ·W4 = v11w41 + v12w42 + v13w43

1

1 + e�V1W4⇡ 0

dog cat run fly

dog cat run

fly

Non-context Word

cat 1

v11

v12

v13

v21

v22

v23

w41

w42

w43

V2 ·W4 = v21w41 + v22w42 + v23w43

dog cat run

fly

dog cat run

fly

fly

1

1 + e�V2W4⇡ 0

Result

dog cat run

fly

dog cat run fly

v11

v12

v13

v21

v22

v23

w31

w32

w33

w41

w42

w43

dog

cat run

fly

語意運算實作

語意運算實作 https://github.com/ckmarkoh/ntc_deeplearning_tensorflow/blob/master/sec4/semantics.ipynb

訓練資料 anarchism originated as a term of abuse first used against early working class radicals including the diggers of the english revolution and the sans culottes of the french revolution whilst the term is still used in a pejorative way to describe any act that used violent means to destroy the organization of society it has also been taken up as a positive label by self defined anarchists the word anarchism is derived from the greek without archons ruler chief king anarchism as a political philosophy is the belief that rulers are unnecessary and should be abolished although there are differing interpretations of what this means anarchism also refers to related social movements that advocate the elimination of authoritarian institutions particularly the state the word anarchy as most anarchists use it does not imply chaos nihilism or anomie but rather a harmonious anti authoritarian society in place of what

前處理 anarchism originated as a term of abuse first used against early working class radicals including the diggers of the english revolution and the sans culottes of the french revolution whilst the term is still used in a pejorative way to describe any act that used violent means to destroy the organization of society it has also been taken up ….

[‘anarchism’, ‘originated’, ‘as’, ‘a’, ‘term’, ‘of’, ‘abuse’, ‘first’, ‘used’, ‘against’, ‘early’, ‘working’, ‘class’, ‘radicals’, ‘including’, ‘the’, ‘diggers’, ‘of’, ‘the’, ‘english’, ‘revolution’, ‘and’, ‘the’, ‘sans’, ‘culottes’, ‘of’, ‘the’, ‘french’, ‘revolution’, ‘whilst’, ‘the’, ‘term’, ‘is’, ‘still’, ‘used’, ‘in’, ‘a’, ‘pejorative’, ‘way’, ‘to’, ‘describe’, ‘any’, ‘act’, ‘that’, ‘used’, ‘violent’, ‘means’, ‘to’, ‘destroy’, ‘the’... ]

前處理

‘the’, ‘english’, ‘revolution’, ‘and’, ‘the’, ‘sans’, UNK, 'of', 'the', 'french', 'revolution’…

1, 103, 855, 3, 1, 15068, 0, 2, 1, 151, 855, …

‘the’, ‘english’, ‘revolution’, ‘and’, ‘the’, ‘sans’, ‘culottes’, 'of', 'the', 'french', 'revolution’…

‘the’, ‘english’, ‘revolution’, ‘and’, ‘the’, ‘sans’, ‘culottes’, 'of', 'the', 'french', 'revolution’…

字典外的字,用UNK代替。

將字轉換成字典內的代碼。

根據詞頻, 轉換成字典

{“UNK”: 0, “the”: 1, “of”: 2, “and”: 3, “one”: 4, “in”: 5, “a”: 6, “to”: 7, “zero”: 8, “nine”: 9, .... }

# 字典大小 vocabulary_size = 50000

前處理 5239, 3084, 12, 6, 195, 2, 3137, 46, 59, 156, 128, 742, 477, 10572, 134, 1, 27549, 2, 1, 103, 855, 3, 1, 15068, 0, 2, 1, 151, 855, …

input output

3084 5239

3084 12

12 3084

12 6

6 12

6 195

195 6

195 2

3084 5239

word2vec

前處理

5239, 3084, 12, 6, 195, 2, 3137, 46, 59, 156, 128, 742, 477, 10572, 134, 1, 27549, 2, 1, 103, 855, 3, 1, 15068, 0, 2, 1, 151, 855, …

generate_batch(batch_size=8, num_skips=2, skip_window=1)

batch size

input 3084 3084 12 12 6 6 195 195

output 5239 12 3084 6 12 195 6 2

num_skips

batch_size

skip_window=1

Computational Graph train_inputs = tf.placeholder(tf.int32, shape=[batch_size]) train_labels = tf.placeholder(tf.int32, shape=[batch_size, 1]) with tf.device('/cpu:0'):

embeddings = tf.Variable( tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0))

embed = tf.nn.embedding_lookup(embeddings, train_inputs) nce_weights = tf.Variable(

tf.truncated_normal([vocabulary_size, embedding_size], stddev=1.0 / math.sqrt(embedding_size))) nce_biases = tf.Variable(tf.zeros([vocabulary_size])) loss = tf.reduce_mean( tf.nn.nce_loss(nce_weights, nce_biases, embed, train_labels, num_sampled, vocabulary_size))

optimizer = tf.train.GradientDescentOptimizer(1.0).minimize(loss)

Device with tf.device('/cpu:0’)

在CPU上執行以下定義的Computational Graph

由於Tensorflow未支援 embedding_lookup 在GPU上執行,故需令它在CPU上執行。

Inputs & Outputs

word2vec

train_inputs = tf.placeholder(tf.int32, shape=[batch_size]) train_labels = tf.placeholder(tf.int32, shape=[batch_size, 1])

train_inputs 3084

3084

12

12

6

6

195

195

train_labels 5239

12

3084

6

12

195

6

2

Embedding Lookup embeddings = tf.Variable(tf.random_uniform([vocabulary_size,

embedding_size], -1.0, 1.0)) embed = tf.nn.embedding_lookup(embeddings, train_inputs)

train_inputs 2

embeddings

embedding_lookup

NCE Weights •  NCE: Noise Contrastive Estimation

nce_weights = tf.Variable( tf.truncated_normal([vocabulary_size,

embedding_size], stddev=1.0 / math.sqrt(embedding_size) ))

nce_biases = tf.Variable( tf.zeros([vocabulary_size]) )

nce_weights

nce_biases

NCE Loss loss = tf.reduce_mean(

tf.nn.nce_loss(nce_weights, nce_biases, embed, train_labels, num_sampled, vocabulary_size))

v11

v12

v13

v21

v22

v23

w31

w32

w33

1

1 + e�V2W3⇡ 1

v11

v12

v13

v21

v22

v23

w41

w42

w43

1

1 + e�V2W4⇡ 0

Positive Negative

cost = log(1

1 + e

�vT

I

wpos

) +X

neg

log(1� 1

1 + e

�vT

I

wneg

)

Train feed_dict = {train_inputs: batch_inputs,

train_labels: batch_labels} _, loss_val = session.run([optimizer, loss], feed_dict=feed_dict)

loss_val

batch_inputs 3084

3084

12

12

6

6

195

195

batch_labels 5239

12

3084

6

12

195

6

2

Result final_embeddings

array([[-0.02782757, -0.16879494, -0.06111901, ..., -0.25700757, -0.07137159, 0.0191142 ], [-0.00155336, -0.00928817, -0.0535327 , ..., -0.23261793, -0.13980433, 0.18055709], [ 0.02576068, -0.06805354, -0.03688766, ..., -0.15378961, 0.00459271, 0.0717089 ], ..., [ 0.01061165, -0.09820389, -0.09913248, ..., 0.00818674, -0.12992384, 0.05826835], [ 0.0849214 , -0.14137401, 0.09674817, ..., 0.04111136, -0.05420518, -0.01920278], [ 0.08318492, -0.08202577, 0.11284919, ..., 0.03887166, 0.01556483, 0.12496017]], dtype=float32)

Visualization

Most Similar Words def get_most_similar(word, top=10): wid = dictionary.get(word,-1)

result = np.dot(final_embeddings[wid:wid+1,:],final_embeddings.T) result = result [0].argsort().tolist() result.reverse() for idx in result [:10]: print(reverse_dictionary[idx])

get_most_similar("one")

one six two four seven three ...

講師資訊

•  Email: ckmarkoh at gmail dot com •  Blog: http://cpmarkchang.logdown.com •  Github: https://github.com/ckmarkoh

Mark Chang

•  Facebook: https://www.facebook.com/ckmarkoh.chang •  Slideshare: http://www.slideshare.net/ckmarkohchang •  Linkedin:

https://www.linkedin.com/pub/mark-chang/85/25b/847

44

top related