outline - stanford nlp groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfsupervision in...

109
Outline Learning Overview Details Example Lexicon learning Supervision signals 0

Upload: others

Post on 26-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Outline

• Learning

– Overview

– Details

– Example

– Lexicon learning

– Supervision signals

0

Page 2: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Outline

• Learning

– Overview

– Details

– Example

– Lexicon learning

– Supervision signals

1

Page 3: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Supervision in syntactic parsing

Input:

S

NP

NP

ESSLLI 2016

NP

the known summer school

VP

V

is

VP

V

located

PP

in Bolzano

Output:

They play football

S

NP

They

VP

V

play

NP

football 2

Page 4: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Supervision in semantic parsing

Input:

Heavy supervision Light supervision

How tall is Lebron James?

HeightOf.LebronJames

What is Steph Curry’s daughter called?

ChildrenOf.StephCurry u Gender.Female

Youngest player of the Cavaliers

argmin(PlayerOf.Cavaliers,BirthDateOf)

...

How tall is Lebron James?

203cm

What is Steph Curry’s daughter called?

Riley Curry

Youngest player of the Cavaliers

Kyrie Irving

...

[Zelle & Mooney, 1996; Zettlemoyer & Collins, 2005; Clarke et al. 2010; Liang et al., 2011]

3

Page 5: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Supervision in semantic parsing

Input:

Heavy supervision Light supervision

How tall is Lebron James?

HeightOf.LebronJames

What is Steph Curry’s daughter called?

ChildrenOf.StephCurry u Gender.Female

Youngest player of the Cavaliers

argmin(PlayerOf.Cavaliers,BirthDateOf)

...

How tall is Lebron James?

203cm

What is Steph Curry’s daughter called?

Riley Curry

Youngest player of the Cavaliers

Kyrie Irving

...

Output:

Clay Thompson’s weight

WeightOf.ClayThompson

ClayThompson

Clay Thompson

’s Weight

weight

Weight.ClayThompson 205 lbs

[Zelle & Mooney, 1996; Zettlemoyer & Collins, 2005; Clarke et al. 2010; Liang et al., 2011]

3

Page 6: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Learning in a nutshell

utterance

0. Define model for derivations

4

Page 7: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Learning in a nutshell

utterance Parsing

...

0. Define model for derivations

1. Generate candidate derivations (later)

4

Page 8: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Learning in a nutshell

utterance Parsing

...

Label

...

0. Define model for derivations

1. Generate candidate derivations (later)

2. Label as correct and incorrect

4

Page 9: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Learning in a nutshell

utterance Parsing

...

Label

...

Update model

0. Define model for derivations

1. Generate candidate derivations (later)

2. Label as correct and incorrect

3. Update model to favor correct trees

4

Page 10: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Training intuition

Where did Mozart tupress?

Vienna

5

Page 11: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Training intuition

Where did Mozart tupress?

PlaceOfBirth.WolfgangMozart

PlaceOfDeath.WolfgangMozart

PlaceOfMarriage.WolfgangMozart

Vienna

5

Page 12: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Training intuition

Where did Mozart tupress?

PlaceOfBirth.WolfgangMozart ⇒ Salzburg

PlaceOfDeath.WolfgangMozart ⇒ Vienna

PlaceOfMarriage.WolfgangMozart ⇒ Vienna

Vienna

5

Page 13: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Training intuition

Where did Mozart tupress?

PlaceOfBirth.WolfgangMozart ⇒ Salzburg

PlaceOfDeath.WolfgangMozart ⇒ Vienna

PlaceOfMarriage.WolfgangMozart ⇒ Vienna

Vienna

5

Page 14: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Training intuition

Where did Mozart tupress?

PlaceOfBirth.WolfgangMozart ⇒ Salzburg

PlaceOfDeath.WolfgangMozart ⇒ Vienna

PlaceOfMarriage.WolfgangMozart ⇒ Vienna

Vienna

Where did Hogarth tupress?

5

Page 15: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Training intuition

Where did Mozart tupress?

PlaceOfBirth.WolfgangMozart ⇒ Salzburg

PlaceOfDeath.WolfgangMozart ⇒ Vienna

PlaceOfMarriage.WolfgangMozart ⇒ Vienna

Vienna

Where did Hogarth tupress?

PlaceOfBirth.WilliamHogarth

PlaceOfDeath.WilliamHogarth

PlaceOfMarriage.WilliamHogarth

London

5

Page 16: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Training intuition

Where did Mozart tupress?

PlaceOfBirth.WolfgangMozart ⇒ Salzburg

PlaceOfDeath.WolfgangMozart ⇒ Vienna

PlaceOfMarriage.WolfgangMozart ⇒ Vienna

Vienna

Where did Hogarth tupress?

PlaceOfBirth.WilliamHogarth ⇒ London

PlaceOfDeath.WilliamHogarth ⇒ London

PlaceOfMarriage.WilliamHogarth ⇒ Paddington

London

5

Page 17: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Training intuition

Where did Mozart tupress?

PlaceOfBirth.WolfgangMozart ⇒ Salzburg

PlaceOfDeath.WolfgangMozart ⇒ Vienna

PlaceOfMarriage.WolfgangMozart ⇒ Vienna

Vienna

Where did Hogarth tupress?

PlaceOfBirth.WilliamHogarth ⇒ London

PlaceOfDeath.WilliamHogarth ⇒ London

PlaceOfMarriage.WilliamHogarth ⇒ Paddington

London

5

Page 18: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Training intuition

Where did Mozart tupress?

PlaceOfBirth.WolfgangMozart ⇒ Salzburg

PlaceOfDeath.WolfgangMozart ⇒ Vienna

PlaceOfMarriage.WolfgangMozart ⇒ Vienna

Vienna

Where did Hogarth tupress?

PlaceOfBirth.WilliamHogarth ⇒ London

PlaceOfDeath.WilliamHogarth ⇒ London

PlaceOfMarriage.WilliamHogarth ⇒ Paddington

London

5

Page 19: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Outline

• Learning

– Overview

– Details

– Example

– Lexicon learning

– Supervision signals

6

Page 20: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Constructing derivations

Type.PersonuPlaceLived.Chicago

Type.Person

people

who PlaceLived.Chicago

PlaceLived

lived in

Chicago

Chicago

lexicon

lexiconlexicon

join

intersect

7

Page 21: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Many possible derivations!

x = people who have lived in Chicago

?

set of candidate derivations D(x)

Type.PersonuPlaceLived.Chicago

Type.Person

people

who PlaceLived.Chicago

PlaceLived

lived in

Chicago

Chicago

lexicon

lexiconlexicon

join

intersectType.OrguPresentIn.ChicagoMusical

Type.Org

people

who PresentIn.ChicagoMusical

PresentIn

lived in

ChicagoMusical

Chicago

lexicon

lexiconlexicon

join

intersect

8

Page 22: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

x: utterance

d: derivation

Type.PersonuPlaceLived.Chicago

Type.Person

people

who PlaceLived.Chicago

PlaceLived

lived in

Chicago

Chicago

lexicon

lexiconlexicon

join

intersect

Feature vector and parameters in RF :

φ(x, d) θ ⇐learned

apply join 1 1.2

apply intersect 1 0.6

apply lexicon 3 2.1

lived maps to PlacesLived 1 3.1

lived maps to PlaceOfBirth 0 -0.4

born maps to PlaceOfBirth 0 2.7

... ... ...

9

Page 23: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

x: utterance

d: derivation

Type.PersonuPlaceLived.Chicago

Type.Person

people

who PlaceLived.Chicago

PlaceLived

lived in

Chicago

Chicago

lexicon

lexiconlexicon

join

intersect

Feature vector and parameters in RF :φ(x, d) θ ⇐learned

apply join 1 1.2

apply intersect 1 0.6

apply lexicon 3 2.1

lived maps to PlacesLived 1 3.1

lived maps to PlaceOfBirth 0 -0.4

born maps to PlaceOfBirth 0 2.7

... ... ...

Scoreθ(x, d) = φ(x, d)>θ =

1.2 · 1 + 0.6 · 1 + 2.1 · 3 + 3.1 · 1 +−0.4 · 0 + 2.7 · 0 + . . .9

Page 24: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Deep learning alert!

The feature vector φ(x, d) is constructed by hand

10

Page 25: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Deep learning alert!

The feature vector φ(x, d) is constructed by hand

Constructing good features is hard

10

Page 26: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Deep learning alert!

The feature vector φ(x, d) is constructed by hand

Constructing good features is hard

Algorithms are likely to do it better

10

Page 27: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Deep learning alert!

The feature vector φ(x, d) is constructed by hand

Constructing good features is hard

Algorithms are likely to do it better

Perhaps we can train φ(x, d)

φ(x, d) = Fψ(x, d), where ψ are the parameters

10

Page 28: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Log-linear model

Candidate derivations: D(x)

Model: distribution over derivations d given utterance x

pθ(d | x) = exp(Scoreθ(x,d))∑d′∈D(x) exp(Scoreθ(x,d

′))

11

Page 29: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Log-linear model

Candidate derivations: D(x)

Model: distribution over derivations d given utterance x

pθ(d | x) = exp(Scoreθ(x,d))∑d′∈D(x) exp(Scoreθ(x,d

′))

scoreθ(x, d) [1, 2, 3, 4]

pθ(d | x) [ ee+e2+e3+e4 ,

e2

e+e2+e3+e4 ,e3

e+e2+e3+e4 ,e4

e+e2+e3+e4 ]

11

Page 30: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Log-linear model

Candidate derivations: D(x)

Model: distribution over derivations d given utterance x

pθ(d | x) = exp(Scoreθ(x,d))∑d′∈D(x) exp(Scoreθ(x,d

′))

scoreθ(x, d) [1, 2, 3, 4]

pθ(d | x) [ ee+e2+e3+e4 ,

e2

e+e2+e3+e4 ,e3

e+e2+e3+e4 ,e4

e+e2+e3+e4 ]

Parsing: find the top-K derivation trees Dθ(x)

11

Page 31: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Features

Dense features:

• intersection=0.67

• ent-popularity:HIGH

• denoation-size:1

12

Page 32: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Features

Dense features:

• intersection=0.67

• ent-popularity:HIGH

• denoation-size:1

Sparse features:

• bridge-binary:STUDY

• born:PlaceOfBirth

• city:Type.Location

12

Page 33: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Features

Dense features:

• intersection=0.67

• ent-popularity:HIGH

• denoation-size:1

Sparse features:

• bridge-binary:STUDY

• born:PlaceOfBirth

• city:Type.Location

Syntactic features:

• ent-pos:NNP NNP

• join-pos:V NN

• skip-pos:IN

12

Page 34: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Features

Dense features:

• intersection=0.67

• ent-popularity:HIGH

• denoation-size:1

Sparse features:

• bridge-binary:STUDY

• born:PlaceOfBirth

• city:Type.Location

Syntactic features:

• ent-pos:NNP NNP

• join-pos:V NN

• skip-pos:IN

Grammar features:

• Binary->Verb12

Page 35: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Learning θ: maximum-likelihood

Training data:

What’s Bulgaria’s capital?

Sofia

What movies has Tom Cruise been in?

TopGun,VanillaSky,...

...

What’s Bulgaria’s capital?

CapitalOf.Bulgaria

What movies has Tom Cruise been in?

Type.Movie u HasPlayed.TomCruise

...

13

Page 36: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Learning θ: maximum-likelihood

Training data:

What’s Bulgaria’s capital?

Sofia

What movies has Tom Cruise been in?

TopGun,VanillaSky,...

...

What’s Bulgaria’s capital?

CapitalOf.Bulgaria

What movies has Tom Cruise been in?

Type.Movie u HasPlayed.TomCruise

...

argmaxθ∑ni=1 log pθ(y

(i) | x(i)) =

argmaxθ∑ni=1 log

∑d(i) pθ(d

(i) | x(i))R(d(i))

13

Page 37: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Learning θ: maximum-likelihood

Training data:

What’s Bulgaria’s capital?

Sofia

What movies has Tom Cruise been in?

TopGun,VanillaSky,...

...

What’s Bulgaria’s capital?

CapitalOf.Bulgaria

What movies has Tom Cruise been in?

Type.Movie u HasPlayed.TomCruise

...

argmaxθ∑ni=1 log pθ(y

(i) | x(i)) =

argmaxθ∑ni=1 log

∑d(i) pθ(d

(i) | x(i))R(d(i))

R(d) =

{1 d.z = z(i)

0 o/wR(d) =

{1 [d.z]K = y(i)

0 o/wR(d) = F1([d.z]K, y

(i))

13

Page 38: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Optimization: stochastic gradient descent

For every example:

O(θ) = log∑d pθ(d | x)R(d)

∇O(θ) = Eqθ(d|x)[φ(x, d)]− Epθ(d|x)[φ(x, d)]pθ(d | x) ∝ exp(φ(x, d)>θ)

qθ(d | x) ∝ exp(φ(x, d)>θ) ·R(d)

14

Page 39: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Optimization: stochastic gradient descent

For every example:

O(θ) = log∑d pθ(d | x)R(d)

∇O(θ) = Eqθ(d|x)[φ(x, d)]− Epθ(d|x)[φ(x, d)]pθ(d | x) ∝ exp(φ(x, d)>θ)

qθ(d | x) ∝ exp(φ(x, d)>θ) ·R(d)

pθ(D(x)) = [0.2, 0.1, 0.1, 0.6]

R(D(x)) = [1, 0, 0, 1]

14

Page 40: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Optimization: stochastic gradient descent

For every example:

O(θ) = log∑d pθ(d | x)R(d)

∇O(θ) = Eqθ(d|x)[φ(x, d)]− Epθ(d|x)[φ(x, d)]pθ(d | x) ∝ exp(φ(x, d)>θ)

qθ(d | x) ∝ exp(φ(x, d)>θ) ·R(d)

pθ(D(x)) = [0.2, 0.1, 0.1, 0.6]

R(D(x)) = [1, 0, 0, 1]

qθ(D(x)) = [0.25, 0, 0, 0.75]

qθ =pθp>θ R

14

Page 41: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Optimization: stochastic gradient descent

For every example:

O(θ) = log∑d pθ(d | x)R(d)

∇O(θ) = Eqθ(d|x)[φ(x, d)]− Epθ(d|x)[φ(x, d)]pθ(d | x) ∝ exp(φ(x, d)>θ)

qθ(d | x) ∝ exp(φ(x, d)>θ) ·R(d)

pθ(D(x)) = [0.2, 0.1, 0.1, 0.6]

R(D(x)) = [1, 0, 0, 1]

qθ(D(x)) = [0.25, 0, 0, 0.75]

qθ =pθp>θ R

Gradient:

0.05 · φ(x, d1)− 0.1 · φ(x, d2)− 0.1 · φ(x, d3) + 0.15 · φ(x, d4)

14

Page 42: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Training

Input: {xi, yi}ni=1

Output: θ

15

Page 43: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Training

Input: {xi, yi}ni=1

Output: θ

θ ← 0

15

Page 44: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Training

Input: {xi, yi}ni=1

Output: θ

θ ← 0

for iteration τ and example i

D(xi)← argmaxK(pθ(d | xi))

15

Page 45: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Training

Input: {xi, yi}ni=1

Output: θ

θ ← 0

for iteration τ and example i

D(xi)← argmaxK(pθ(d | xi))

θ ← θ + ητ,i(Eqθ(d|xi)[φ(xi, d)]− Epθ(d|xi)[φ(xi, d)])

15

Page 46: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Training

Input: {xi, yi}ni=1

Output: θ

θ ← 0

for iteration τ and example i

D(xi)← argmaxK(pθ(d | xi))

θ ← θ + ητ,i(Eqθ(d|xi)[φ(xi, d)]− Epθ(d|xi)[φ(xi, d)])

ητ,i: learning rate

Regularization often added (L2, L1, ...)

15

Page 47: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Training (structured perceptron)

Input: {xi, yi}ni=1

Output: θ

16

Page 48: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Training (structured perceptron)

Input: {xi, yi}ni=1

Output: θ

θ ← 0

16

Page 49: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Training (structured perceptron)

Input: {xi, yi}ni=1

Output: θ

θ ← 0

for iteration τ and example i

d̂← argmax(pθ(d | xi))

d∗ ← argmax(qθ(d | xi))

16

Page 50: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Training (structured perceptron)

Input: {xi, yi}ni=1

Output: θ

θ ← 0

for iteration τ and example i

d̂← argmax(pθ(d | xi))

d∗ ← argmax(qθ(d | xi))

if [d∗]K 6= [d̂]K

θ ← θ + φ(xi, d∗)− φ(xi, d̂))

16

Page 51: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Training (structured perceptron)

Input: {xi, yi}ni=1

Output: θ

θ ← 0

for iteration τ and example i

d̂← argmax(pθ(d | xi))

d∗ ← argmax(qθ(d | xi))

if [d∗]K 6= [d̂]K

θ ← θ + φ(xi, d∗)− φ(xi, d̂))

Regularization often added with weight averaging

16

Page 52: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Training

Other simple variants exist:

• E.g., cost-sensitive max-margin training

• That is, find pairs of good and bad derivations that look differentbut have similar scores and update on those

17

Page 53: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Outline

• Learning

– Overview

– Details

– Example

– Lexicon learning

– Supervision signals

18

Page 54: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Example

./run @mode=simple-lambdadcs \

-Grammar.inPaths esslli 2016/class3 demo.grammar \

-SimpleLexicon.inPaths esslli 2016/class3 demo.lexicon

(loadgraph geo880/geo880.kg)

size of california

size capital california

size of capital of california

california size

19

Page 55: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Exercise

Find a pair of natural language utterances that cannot be distinguishedusing the current feature representation

• The utterances can be not fully grammatical in English

• You can ignore the denotation feature if that helps

• Verify this in sempre (ask me how to disable features)

• Design a feature that will solve this problem

20

Page 56: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Outline

• Learning

– Overview

– Details

– Example

– Lexicon learning

– Supervision signals

21

Page 57: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

The lexicon problem

How is the lexicon generated?

• Annotation

• Exhaustive search

• String matching

• Supervised alignment

• Unsupervised alignment

• Learning

22

Page 58: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Training

Input: {xi, yi}ni=1

Output: θ

θ ← 0

for iteration τ and example i

Add lexicon entries

D(xi)← argmaxK(pθ(d | xi)

θ ← θ + ητ,i(Eqθ(d|xi)[φ(xi, d)]− Epθ(d|xi)[φ(xi, d)])

ητ,i: learning rate

Regularization often added (L2, L1, ...)

23

Page 59: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Adding lexicon entries

Input: training example (xi, yi), current lexicon Λ, model θ

Λtemp ← Λ ∪ GENLEX(xi, yi) Create expanded temporaty lexicon

[Adapted from semantic parsing tutorial, Artzi et al.]

24

Page 60: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Adding lexicon entries

Input: training example (xi, yi), current lexicon Λ, model θ

Λtemp ← Λ ∪ GENLEX(xi, yi) Create expanded temporaty lexicon

D(xi)← arg maxK(pθ,Λtemp(d | xi)) Parse with temporary lexicon

[Adapted from semantic parsing tutorial, Artzi et al.]

24

Page 61: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Adding lexicon entries

Input: training example (xi, yi), current lexicon Λ, model θ

Λtemp ← Λ ∪ GENLEX(xi, yi) Create expanded temporaty lexicon

D(xi)← arg maxK(pθ,Λtemp(d | xi)) Parse with temporary lexicon

Λ = Λ ∪ {l | l ∈ d̂, d̂ ∈ D(xi), R(d) = 1} Add entries from correct trees

[Adapted from semantic parsing tutorial, Artzi et al.]

24

Page 62: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Adding lexicon entries

Input: training example (xi, yi), current lexicon Λ, model θ

Λtemp ← Λ ∪ GENLEX(xi, yi) Create expanded temporaty lexicon

D(xi)← arg maxK(pθ,Λtemp(d | xi)) Parse with temporary lexicon

Λ = Λ ∪ {l | l ∈ d̂, d̂ ∈ D(xi), R(d) = 1} Add entries from correct trees

Overgenerate lexical entries and add promising ones

[Adapted from semantic parsing tutorial, Artzi et al.]

24

Page 63: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Lexicon generation

Logical form supervision:

Largest state bordering California

argmax(Type(State) u Border(California),Area)

[Zettlemoyer and Collins, 2005]

25

Page 64: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Lexicon generation

Logical form supervision:

Largest state bordering California

argmax(Type(State) u Border(California),Area)

Enumerate spans

Largest

state

bordering

California

Largest state

. . .

Use rules to extract sub-forumulas

California

Border(California)

λf.f(California)

Area

λx.argmax(x,Area)

. . .

[Zettlemoyer and Collins, 2005]

25

Page 65: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Lexicon generation

Logical form supervision:

Largest state bordering California

argmax(Type(State) u Border(California),Area)

Enumerate spans

Largest

state

bordering

California

Largest state

. . .

Use rules to extract sub-forumulas

California

Border(California)

λf.f(California)

Area

λx.argmax(x,Area)

. . .

Add cross-product to lexicon

[Zettlemoyer and Collins, 2005]

25

Page 66: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Lexicon generation

Denotation supervision:

Largest state bordering California

Arizona

26

Page 67: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Lexicon generation

Denotation supervision:

Largest state bordering California

Arizona

Enumerate spans

Largest

state

bordering

California

Largest state

. . .

Generate sub-formulas from KB

California

Border(California)

Traverse

Type.Mountain

λx.argmax(x,Elevation)

. . .

Restrict candidates with alignment, string matching, ...

26

Page 68: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Lexicon generation

Denotation supervision:

Largest state bordering California

Arizona

Enumerate spans

Largest

state

bordering

California

Largest state

. . .

Generate sub-formulas from KB

California

Border(California)

Traverse

Type.Mountain

λx.argmax(x,Elevation)

. . .

Restrict candidates with alignment, string matching, ...

Fancier methods exist (coarse-to-fine)

26

Page 69: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Unification

Logical form supervision:

[Kwiatkowski et al, 2010]

27

Page 70: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Unification

Logical form supervision:

Initialize lexicon with (xi, zi):

States bordering California

Type(State) u Border(California)

Split lexical entry in all possible ways:

[Kwiatkowski et al, 2010]

27

Page 71: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Unification

Logical form supervision:

Initialize lexicon with (xi, zi):

States bordering California

Type(State) u Border(California)

Split lexical entry in all possible ways:

Enumerate spans

(states, bordering california)

(states bordering, california)

Generate sub-formulas from KB

(Type(State), λx.x u Border(California))

(λx.Type(State) u x,Border(California))

(λf.Type(State) u f(California),California)

. . .

[Kwiatkowski et al, 2010]

27

Page 72: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Unification

For example xi, zi:

[Kwiatkowski et al, 2010]

28

Page 73: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Unification

For example xi, zi:

Find highest scoring correct parse d∗

[Kwiatkowski et al, 2010]

28

Page 74: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Unification

For example xi, zi:

Find highest scoring correct parse d∗

Split all lexical entries in d∗ in all possible ways

[Kwiatkowski et al, 2010]

28

Page 75: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Unification

For example xi, zi:

Find highest scoring correct parse d∗

Split all lexical entries in d∗ in all possible ways

Add to lexicon lexical entry that improves parse score best

[Kwiatkowski et al, 2010]

28

Page 76: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Do we need a lexicon?

Type(State) u Border(California)

Type(State)

Type State

Border(California)

Border California

California neighbors

29

Page 77: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Do we need a lexicon?

Type(State) u Border(California)

Type(State)

Type State

Border(California)

Border California

California neighbors

Floating parse tree: generalization of bridging

29

Page 78: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Do we need a lexicon?

Type(State) u Border(California)

Type(State)

Type State

Border(California)

Border California

California neighbors

Floating parse tree: generalization of bridging

Perhaps with better learning and search not necessary?

29

Page 79: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Outline

• Learning

– Overview

– Details

– Example

– Lexicon learning

– Supervision signals

30

Page 80: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Supervision signals

We discussed training from logical forms and denotations

31

Page 81: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Supervision signals

We discussed training from logical forms and denotations

Other forms of supervision have been proposed:

• Demonstrations

• Distant supervision

• Conversations

• Unsupervised

• Paraphrasing

31

Page 82: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Training from demonstrations

Input: (xi, si, ti)

xi: utterance

si: start state

ti: end state

move forward until you reach the intersection

[Artzi and Zettlemoyer, 2013]

32

Page 83: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Training from demonstrations

Input: (xi, si, ti)

xi: utterance

si: start state

ti: end state

move forward until you reach the intersection

λa.move(a) ∧ dir(a.forward) . . .

[Artzi and Zettlemoyer, 2013]

32

Page 84: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Training from demonstrations

Input: (xi, si, ti)

xi: utterance

si: start state

ti: end state

move forward until you reach the intersection

λa.move(a) ∧ dir(a.forward) . . .

An instance of learning from denotations

[Artzi and Zettlemoyer, 2013]

32

Page 85: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Distant supervision

Data generation:

• Decompose declarative text to questions and answers

James Cameron is the director of Titanic

Q: X is the director of Titanic A: James Cameron

[Reddy et al., 2014]

33

Page 86: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Distant supervision

Data generation:

• Decompose declarative text to questions and answers

James Cameron is the director of Titanic

Q: X is the director of Titanic A: James Cameron

Declarative text is cheap!

[Reddy et al., 2014]

33

Page 87: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Distant supervision

Training:

• Use exising non-executable semantic parsers

X is the director of Titanic

[Reddy et al., 2014]

34

Page 88: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Distant supervision

Training:

• Use exising non-executable semantic parsers

X is the director of Titanic

λx.director(x) ∧ director.of.arg1(e, x) ∧ director.of.arg2(e,Titanic)

[Reddy et al., 2014]

34

Page 89: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Distant supervision

Training:

• Use exising non-executable semantic parsers

X is the director of Titanic

λx.director(x) ∧ director.of.arg1(e, x) ∧ director.of.arg2(e,Titanic)

λx.Director(x) ∧ FilmDirectedBy(e, x) ∧ FileDirected(e,Titanic)

λx.Producer(x) ∧ FilmProducedBy(e, x) ∧ FileProduced(e,Titanic)

[Reddy et al., 2014]

34

Page 90: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Distant supervision

Training:

• Use exising non-executable semantic parsers

X is the director of Titanic

λx.director(x) ∧ director.of.arg1(e, x) ∧ director.of.arg2(e,Titanic)

λx.Director(x) ∧ FilmDirectedBy(e, x) ∧ FileDirected(e,Titanic)

λx.Producer(x) ∧ FilmProducedBy(e, x) ∧ FileProduced(e,Titanic)

James Cameron true

James Camerson, Jon Landau false

[Reddy et al., 2014]

34

Page 91: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Training from conversations

35

Page 92: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Training from conversations

z1 : From(Atlanta) u To(London)

z2 : From(Atlanta) u From(London)

z3 : To(Atlanta) u To(London)

z4 : To(Atlanta) u From(London)

35

Page 93: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Training from conversations

z1 : From(Atlanta) u To(London)

z2 : From(Atlanta) u From(London)

z3 : To(Atlanta) u To(London)

z4 : To(Atlanta) u From(London)

Define loss:

• Does z align with converation?

• Does z obey domain constraints?

35

Page 94: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Unsupervised learning

Intuition: assume repeating patterns are correct

[Goldwasser et al., 2011]

36

Page 95: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Unsupervised learning

Intuition: assume repeating patterns are correct

Input: {xi}ni=1

Output: θ

[Goldwasser et al., 2011]

36

Page 96: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Unsupervised learning

Intuition: assume repeating patterns are correct

Input: {xi}ni=1

Output: θ

θ initialized manually, S = φ

[Goldwasser et al., 2011]

36

Page 97: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Unsupervised learning

Intuition: assume repeating patterns are correct

Input: {xi}ni=1

Output: θ

θ initialized manually, S = φ

Until stopping criterion met

for example xi

S = S ∪ (xi, argmax pθ(d | xi))

[Goldwasser et al., 2011]

36

Page 98: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Unsupervised learning

Intuition: assume repeating patterns are correct

Input: {xi}ni=1

Output: θ

θ initialized manually, S = φ

Until stopping criterion met

for example xi

S = S ∪ (xi, argmax pθ(d | xi))Compute statistics of S

[Goldwasser et al., 2011]

36

Page 99: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Unsupervised learning

Intuition: assume repeating patterns are correct

Input: {xi}ni=1

Output: θ

θ initialized manually, S = φ

Until stopping criterion met

for example xi

S = S ∪ (xi, argmax pθ(d | xi))Compute statistics of S

Sconf ←find confident subset

[Goldwasser et al., 2011]

36

Page 100: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Unsupervised learning

Intuition: assume repeating patterns are correct

Input: {xi}ni=1

Output: θ

θ initialized manually, S = φ

Until stopping criterion met

for example xi

S = S ∪ (xi, argmax pθ(d | xi))Compute statistics of S

Sconf ←find confident subset

Train on Sconf

[Goldwasser et al., 2011]

36

Page 101: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Unsupervised learning

Intuition: assume repeating patterns are correct

Input: {xi}ni=1

Output: θ

θ initialized manually, S = φ

Until stopping criterion met

for example xi

S = S ∪ (xi, argmax pθ(d | xi))Compute statistics of S

Sconf ←find confident subset

Train on Sconf

Substantially lower performance

[Goldwasser et al., 2011]

36

Page 102: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Paraphrasing

What languages do people in Brazil use?

B. and Liang, 2014

37

Page 103: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Paraphrasing

What languages do people in Brazil use?

Type.HumanLanguage u LanguagesSpoken.Brazil ... CapitalOf.Brazil

B. and Liang, 2014

37

Page 104: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Paraphrasing

What languages do people in Brazil use?

What language is the language of Brazil? ... What city is the capital of Brazil?

Type.HumanLanguage u LanguagesSpoken.Brazil ... CapitalOf.Brazil

B. and Liang, 2014

37

Page 105: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Paraphrasing

What languages do people in Brazil use?

paraphrase model

What language is the language of Brazil? ... What city is the capital of Brazil?

Type.HumanLanguage u LanguagesSpoken.Brazil ... CapitalOf.Brazil

B. and Liang, 2014

37

Page 106: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Paraphrasing

What languages do people in Brazil use?

paraphrase model

What language is the language of Brazil? ... What city is the capital of Brazil?

Type.HumanLanguage u LanguagesSpoken.Brazil ... CapitalOf.Brazil

Portuguese, ...

B. and Liang, 2014

37

Page 107: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Paraphrasing

What languages do people in Brazil use?

paraphrase model

What language is the language of Brazil? ... What city is the capital of Brazil?

Type.HumanLanguage u LanguagesSpoken.Brazil ... CapitalOf.Brazil

Portuguese, ...

Model: pθ(c, z | x) = p(z | x)× pθ(c | x)

Idea: train a large paraphrase model pθ(c | x)

B. and Liang, 2014

37

Page 108: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Paraphrasing

What languages do people in Brazil use?

paraphrase model

What language is the language of Brazil? ... What city is the capital of Brazil?

Type.HumanLanguage u LanguagesSpoken.Brazil ... CapitalOf.Brazil

Portuguese, ...

Model: pθ(c, z | x) = p(z | x)× pθ(c | x)

Idea: train a large paraphrase model pθ(c | x)

More later

B. and Liang, 2014

37

Page 109: Outline - Stanford NLP Groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfSupervision in syntactic parsing Input : S NP NP ESSLLI 2016 NP the known summer school VP V is VP

Summary

We saw how to train from denotations and logical forms

We see methods for inducing lexicons during training

We reviewed work on how to use even weaker forms of supervision

Still an open problem

38