computational linguistics week 5

50
Computa(onal Linguis(cs Week 5 Neural Networks and Neural Language Models By Mark Chang

Upload: mark-chang

Post on 10-Feb-2017

540 views

Category:

Science


2 download

TRANSCRIPT

Page 1: Computational Linguistics  week 5

Computa(onal  Linguis(cs  Week  5    

Neural  Networks  and    Neural  Language  Models  

By  Mark  Chang  

Page 2: Computational Linguistics  week 5

Outlines  

•  Machine  Learning  •  Neural  Networks  •  Training  Neural  Networks  •  Vector  Space  of  Seman(cs  •  Neural  Language  Models  (word2vec)  

Page 3: Computational Linguistics  week 5

Machine  Learning  

Page 4: Computational Linguistics  week 5

Machine  Learning

Training  Data

Machine  Learning  Model Output

Answer

Error FeedBack

Machine  Learning  Model

Tes(ng  Data

AJer  Training

Output

Page 5: Computational Linguistics  week 5

Machine  Learning

Training  Data  

X  ,  Y  x(i),  y(i)  

Model  

h  Parameter  

w

Output  

h(X)

Answer  

Y

Cost  Func(on  

E(h(X),Y) Feedback

X

Y

Page 6: Computational Linguistics  week 5

Logis(c  Regression  

Page 7: Computational Linguistics  week 5

Training  Data  X   Y  

-­‐0.47241379 0 -­‐0.35344828 0 -­‐0.30148276 0 0.33448276 1 0.35344828 1 0.37241379 1 0.39137931 1 0.41034483 1 0.44931034 1 0.49827586 1 0.51724138 1

…. ….

Page 8: Computational Linguistics  week 5

Model  

Sigmoid  func(on   h(x) =1

1 + e

�(w0+w1x)

w0 + w1x < 0

h(x) ⇡ 0

w0 + w1x > 0

h(x) ⇡ 1

Page 9: Computational Linguistics  week 5

Cost  Func(on  

•  Cross  Entropy  

 E(h(X), Y ) =�1

m

(mX

i

y

(i)log(h(x(i))) + (1� y

(i))log(1� h(x(i))))

y

(i) = 1

E(h(x(i)), y(i)) = �log(h(x(i)))

h(x(i)) ⇡ 0 ) E(h(x(i)), y(i)) ⇡ 1h(x(i)) ⇡ 1 ) E(h(x(i)), y(i)) ⇡ 0

y

(i) = 0

E(h(x(i)), y(i)) = �log(1� h(x(i)))

h(x(i)) ⇡ 0 ) E(h(x(i)), y(i)) ⇡ 0

h(x(i)) ⇡ 1 ) E(h(x(i)), y(i)) ⇡ 1

Page 10: Computational Linguistics  week 5

Cost  Func(on  

•  Cross  Entropy  

 E(h(X), Y ) =�1

m

(mX

i

y

(i)log(h(x(i))) + (1� y

(i))log(1� h(x(i))))

h(x(i)) ⇡ 0 and y

(i) = 0 ) E(h(X), Y ) ⇡ 0

h(x(i)) ⇡ 1 and y

(i) = 1 ) E(h(X), Y ) ⇡ 0

h(x(i)) ⇡ 0 and y

(i) = 1 ) E(h(X), Y ) ⇡ 1h(x(i)) ⇡ 1 and y

(i) = 0 ) E(h(X), Y ) ⇡ 1

Page 11: Computational Linguistics  week 5

   w1          w0        

Feedback  

•  Gradient  Descent:  

 w0 w0–⌘

@E(h(X), Y )

@w0

w1 w1–⌘@E(h(X), Y )

@w1

(�@E(h(X), Y )

@w0,�@E(h(X), Y )

@w1)

Page 12: Computational Linguistics  week 5

Feedback  

Page 13: Computational Linguistics  week 5

Neural  Networks  

Page 14: Computational Linguistics  week 5

Neurons  &  Ac(on  Poten(al  

h`p://humanphisiology.wikispaces.com/file/view/neuron.png/216460814/neuron.png    

h`p://upload.wikimedia.org/wikipedia/commons/thumb/4/4a/Ac(on_poten(al.svg/1037px-­‐Ac(on_poten(al.svg.png    

Page 15: Computational Linguistics  week 5

Synapse  

h`p://www.quia.com/files/quia/users/lmcgee/Systems/endocrine-­‐nervous/synapse.gif    

Page 16: Computational Linguistics  week 5

Ar(ficial  Neurons  

n W1

W2

x1

x2

b Wb

y

n

in

= w1x1 + w2x2 + w

b

n

out

=1

1 + e

�nin

nin

nout

y =1

1 + e�(w1x1+w2x2+wb)

Page 17: Computational Linguistics  week 5

nout

= 1

nout

= 0.5

nout

= 0(0,0)

x2

x1

Ar(ficial  Neurons

n

in

= w1x1 + w2x2 + w

b

n

out

=1

1 + e

�nin

n

in

= w1x1 + w2x2 + w

b

n

out

=1

1 + e

�nin

w1x1 + w2x2 + wb = 0

w1x1 + w2x2 + wb > 0

w1x1 + w2x2 + wb < 0

1

0

Page 18: Computational Linguistics  week 5

Binary  Classifica(on:AND  Gate  

x1 x2 y

0 0 0

0 1 0

1 0 0

1 1 1

(0,0)

(0,1) (1,1)

(1,0)

0

1

n 20

20

b-­‐30

y x1  

x2  

y =1

1 + e�(20x1+20x2�30)

20x1 + 20x2 � 30 = 0

Page 19: Computational Linguistics  week 5

Binary  Classifica(on:OR  Gate

x1 x2 y

0 0 0

0 1 1

1 0 1

1 1 1

y =1

1 + e�(20x1+20x2�10)

(0,0)

(0,1) (1,1)

(1,0)

0

1

n 20

20

b-­‐10

y x1  

x2  

20x1 + 20x2 � 10 = 0

Page 20: Computational Linguistics  week 5

XOR  Gate  ?

(0,0)

(0,1) (1,1)

(1,0)

0

0

1

x1 x2 y

0 0 0

0 1 1

1 0 1

1 1 0

Page 21: Computational Linguistics  week 5

Binary  Classifica(on:XOR  Gate  

n

-­‐20

20

b

-­‐10

y

(0,0)

(0,1) (1,1)

(1,0)

0 1

(0,0)

(0,1) (1,1)

(1,0)

1

0

(0,0)

(0,1) (1,1)

(1,0) 0

0 1

n1 20

20

b-­‐30

x1  

x2  

n2 20

20

b-­‐10

x1  

x2  

x1 x2 n1 n2 y

0 0 0 0 0

0 1 0 1 1

1 0 0 1 1

1 1 1 1 0

Page 22: Computational Linguistics  week 5

Neural  Networks  

x

y

n11

n12

n21

n22 W12,y

W12,x

b

W11,y

W11,b W12,b

b

W11,x W21,11

W22,12

W21,12

W22,11

W21,b W22,b

z1

z2

Input    Layer

Hidden  Layer

Output  Layer

Page 23: Computational Linguistics  week 5

Visual  Pathway

http://www.nature.com/neuro/journal/v8/n8/images/nn0805-975-F1.jpg  

Page 24: Computational Linguistics  week 5

Training  Neural  Networks  

Page 25: Computational Linguistics  week 5

Training  Neural  Networks  

Training  Data

Neural  Networks Output

Answer

Ini(aliza(on Forward  Propaga(on  

Error  Func(on  

Backward  Propaga(on  

Page 26: Computational Linguistics  week 5

Ini(aliza(on

•  Randomly  sampling  W  from    –N  ~  N  

x

y

n11

n12

n21

n22 W12,y

W12,x

b

W11,y

W11,b W12,b

b

W11,x W21,11

W22,12

W21,12

W22,11

W21,b W22,b

z1

z2

Page 27: Computational Linguistics  week 5

Forward  Propaga(on  

Page 28: Computational Linguistics  week 5

Error  Func(on

J = �(z1log(n21(out)) + (1� z1)log(1� n21(out)))

� (z2log(n22(out)) + (1� z2)log(1� n22(out)))

n21

n22

z1

z2

nout

⇡ 0 and z = 0 ) J ⇡ 0

nout

⇡ 1 and z = 1 ) J ⇡ 0

nout

⇡ 0 and z = 1 ) J ⇡ 1nout

⇡ 1 and z = 0 ) J ⇡ 1

Page 29: Computational Linguistics  week 5

   w1          w0        

Gradient  Descent

w21,11 w21,11 � ⌘@J

@w21,11

w21,12 w21,12 � ⌘@J

@w21,12

w21,b w21,b � ⌘@J

@w21,b

w22,11 w21,11 � ⌘@J

@w22,11

w22,12 w21,12 � ⌘@J

@w22,12

w22,b w21,b � ⌘@J

@w22,b

w11,x w11,x � ⌘@J

@w11,x

w11,y w11,y � ⌘@J

@w11,y

w11,b w11,b � ⌘@J

@w11,b

w12,x w12,x � ⌘@J

@w12,x

w12,y w12,y � ⌘@J

@w12,y

w12,b w12,b � ⌘@J

@w12,b

(–@J

@w0, –

@J

@w1)

Page 30: Computational Linguistics  week 5

Backward  Propaga(on  

http://cpmarkchang.logdown.com/posts/277349-neural-network-backward-propagation  

Page 31: Computational Linguistics  week 5

Vector  Space  of  Seman(cs  

Page 32: Computational Linguistics  week 5

Distribu(on  Seman(cs

•  The  meaning  of  a  word  can  be  inferred  from  its  context.  

The meanings of dog and cat are similar.

The  dog  run.  A  cat  run.  A  dog  sleep.  The  cat  sleep.  A  dog  bark.  The  cat  meows.  

Page 33: Computational Linguistics  week 5

Seman(c  Vectors

The  dog  run.  A  cat  run.  A  dog  sleep.  The  cat  sleep.  A  dog  bark.  The  cat  meows.  

the   a   run   sleep   bark   meow  

dog   1   2   2   2   1   0  

cat   2   1   2   2   0   1  

Page 34: Computational Linguistics  week 5

Seman(c  Vectors  

dog  (1,  2,...,  xn)      

cat  (2,  1,...,  xn)    

Car  (0,  0,...,  xn)    

Page 35: Computational Linguistics  week 5

Cosine  Similarity  

•  Cosine  Similarity  between  A  &  B  is:   A ·B|A||B|

dog      (a1, a2, ..., an)

cat   (b1, b2, ..., bn)

Cosine  similarity  between  dog  &  cat  is:  

a1b1 + a2b2 + ...+ anbnpa21 + a22 + ...+ a2n

pb21 + b22 + ...+ b2n

Page 36: Computational Linguistics  week 5

Opera(on  of  Vectors  

Woman  +  King  -­‐    Man    =  Queen

Woman Queen

Man King

King  -­‐  Man

King  -­‐  Man

Page 37: Computational Linguistics  week 5

Neural  Language  Models  (word2vec)  

Page 38: Computational Linguistics  week 5

Dimension  is  too  LARGE  

(x1=the,  x2  =a,...,  xn)  

dog  

Dimension  of  seman(c  vectors    is  equal  to  the  size  of  vocabulary.  

x1   x2   x3   x4   xn  ...  

Page 39: Computational Linguistics  week 5

Compressed  Vectors  

dog

One-­‐Hot    Encoding

Neural  Network  

Compressed  Vector

1.2  

0.7  

0.5  

1  

0  

0  

0  

Page 40: Computational Linguistics  week 5

One-­‐Hot  Encoding

dog cat run fly 1

Page 41: Computational Linguistics  week 5

Ini(alize  Weights  dog

cat run

fly

dog

cat run

fly

W =

2

664

w11 w12 w13

w21 w22 w23

w31 w32 w33

w31 w32 w43

3

775V =

2

664

v11 v12 v13v21 v22 v23v31 v32 v33v31 v32 v43

3

775

Page 42: Computational Linguistics  week 5

Compressed  Vectors

dog

High  dimension Low  

dimension

v11  

v12  

v13  

v11  

v12  

v13  

v11  

v12  

v13  

Page 43: Computational Linguistics  week 5

Compressed  Vectors  

dog cat run fly

v11  

v12  

v13  

v21  

v22  

v23  

v31  

v32  

v33  

v41  

v42  

v43  

dog

cat run

fly

Page 44: Computational Linguistics  week 5

Context  Word

dog 1

v11  

v12  

v13  

v11  

v12  

v13   run

w31  

w32  

w33  

dog

cat run

fly dog cat run fly

1

1 + e�V1W3⇡ 1

V1 ·W3 = v11w31 + v12w32 + v13w33

Page 45: Computational Linguistics  week 5

Context  Word

cat 1

v11  

v12  

v13  

v21  v22  

v23   run

w31  

w32  

w33  

dog cat run fly

V2 ·W3 = v21w31 + v22w32 + v23w33

dog cat run fly

1

1 + e�V2W3⇡ 1

Page 46: Computational Linguistics  week 5

Non-­‐context  Word

dog 1

v11  

v12  

v13  

v11  

v12  

v13  

fly

w41  

w42  

w43  

V1 ·W4 = v11w41 + v12w42 + v13w43

1

1 + e�V1W4⇡ 0

dog cat run fly

dog cat run

fly

Page 47: Computational Linguistics  week 5

Non-­‐context  Word

cat 1

v11  

v12  

v13  

v21  v22  

v23  

w41  

w42  

w43  

V2 ·W4 = v21w41 + v22w42 + v23w43

dog cat run

fly

dog cat run

fly

fly

1

1 + e�V2W4⇡ 0

Page 48: Computational Linguistics  week 5

Result  

dog cat run fly

v11  

v12  

v13  

v21  

v22  

v23  

v31  

v32  

v33  

v41  

v42  

v43  

dog cat run

fly

Page 49: Computational Linguistics  week 5

Further  Reading  

•  Logis(c  Regression  3D  –  h`p://cpmarkchang.logdown.com/posts/189069-­‐logis(-­‐regression-­‐model  

•  OverFimng  and  Regulariza(on  –  h`p://cpmarkchang.logdown.com/posts/193261-­‐machine-­‐learning-­‐overfimng-­‐and-­‐regulariza(on  

•  Model  Selec(on  –  h`p://cpmarkchang.logdown.com/posts/193914-­‐machine-­‐learning-­‐model-­‐selec(on  

•  Neural  Network  Back  Propaga(on  –  h`p://cpmarkchang.logdown.com/posts/277349-­‐neural-­‐network-­‐backward-­‐propaga(on  

Page 50: Computational Linguistics  week 5

Further  Reading  •  Neural  Probabilis(c  Language  Model:

–  h`p://cpmarkchang.logdown.com/posts/255785-­‐neural-­‐network-­‐neural-­‐probabilis(c-­‐language-­‐model  

–  h`p://cpmarkchang.logdown.com/posts/276263-­‐-­‐hierarchical-­‐probabilis(c-­‐neural-­‐networks-­‐neural-­‐network-­‐language-­‐model  

•  Word2vec  –  h`p://arxiv.org/pdf/1301.3781.pdf  –  h`p://papers.nips.cc/paper/5021-­‐distributed-­‐representa(ons-­‐of-­‐words-­‐and-­‐phrases-­‐and-­‐their-­‐composi(onality.pdf  

–  h`p://www-­‐personal.umich.edu/~ronxin/pdf/w2vexp.pdf