outline - stanford nlp groupnlp.stanford.edu/joberant/esslli_2016/class3_index.pdfsupervision in...
TRANSCRIPT
Outline
• Learning
– Overview
– Details
– Example
– Lexicon learning
– Supervision signals
0
Outline
• Learning
– Overview
– Details
– Example
– Lexicon learning
– Supervision signals
1
Supervision in syntactic parsing
Input:
S
NP
NP
ESSLLI 2016
NP
the known summer school
VP
V
is
VP
V
located
PP
in Bolzano
Output:
They play football
S
NP
They
VP
V
play
NP
football 2
Supervision in semantic parsing
Input:
Heavy supervision Light supervision
How tall is Lebron James?
HeightOf.LebronJames
What is Steph Curry’s daughter called?
ChildrenOf.StephCurry u Gender.Female
Youngest player of the Cavaliers
argmin(PlayerOf.Cavaliers,BirthDateOf)
...
How tall is Lebron James?
203cm
What is Steph Curry’s daughter called?
Riley Curry
Youngest player of the Cavaliers
Kyrie Irving
...
[Zelle & Mooney, 1996; Zettlemoyer & Collins, 2005; Clarke et al. 2010; Liang et al., 2011]
3
Supervision in semantic parsing
Input:
Heavy supervision Light supervision
How tall is Lebron James?
HeightOf.LebronJames
What is Steph Curry’s daughter called?
ChildrenOf.StephCurry u Gender.Female
Youngest player of the Cavaliers
argmin(PlayerOf.Cavaliers,BirthDateOf)
...
How tall is Lebron James?
203cm
What is Steph Curry’s daughter called?
Riley Curry
Youngest player of the Cavaliers
Kyrie Irving
...
Output:
Clay Thompson’s weight
WeightOf.ClayThompson
ClayThompson
Clay Thompson
’s Weight
weight
Weight.ClayThompson 205 lbs
[Zelle & Mooney, 1996; Zettlemoyer & Collins, 2005; Clarke et al. 2010; Liang et al., 2011]
3
Learning in a nutshell
utterance
0. Define model for derivations
4
Learning in a nutshell
utterance Parsing
...
0. Define model for derivations
1. Generate candidate derivations (later)
4
Learning in a nutshell
utterance Parsing
...
Label
...
0. Define model for derivations
1. Generate candidate derivations (later)
2. Label as correct and incorrect
4
Learning in a nutshell
utterance Parsing
...
Label
...
Update model
0. Define model for derivations
1. Generate candidate derivations (later)
2. Label as correct and incorrect
3. Update model to favor correct trees
4
Training intuition
Where did Mozart tupress?
Vienna
5
Training intuition
Where did Mozart tupress?
PlaceOfBirth.WolfgangMozart
PlaceOfDeath.WolfgangMozart
PlaceOfMarriage.WolfgangMozart
Vienna
5
Training intuition
Where did Mozart tupress?
PlaceOfBirth.WolfgangMozart ⇒ Salzburg
PlaceOfDeath.WolfgangMozart ⇒ Vienna
PlaceOfMarriage.WolfgangMozart ⇒ Vienna
Vienna
5
Training intuition
Where did Mozart tupress?
PlaceOfBirth.WolfgangMozart ⇒ Salzburg
PlaceOfDeath.WolfgangMozart ⇒ Vienna
PlaceOfMarriage.WolfgangMozart ⇒ Vienna
Vienna
5
Training intuition
Where did Mozart tupress?
PlaceOfBirth.WolfgangMozart ⇒ Salzburg
PlaceOfDeath.WolfgangMozart ⇒ Vienna
PlaceOfMarriage.WolfgangMozart ⇒ Vienna
Vienna
Where did Hogarth tupress?
5
Training intuition
Where did Mozart tupress?
PlaceOfBirth.WolfgangMozart ⇒ Salzburg
PlaceOfDeath.WolfgangMozart ⇒ Vienna
PlaceOfMarriage.WolfgangMozart ⇒ Vienna
Vienna
Where did Hogarth tupress?
PlaceOfBirth.WilliamHogarth
PlaceOfDeath.WilliamHogarth
PlaceOfMarriage.WilliamHogarth
London
5
Training intuition
Where did Mozart tupress?
PlaceOfBirth.WolfgangMozart ⇒ Salzburg
PlaceOfDeath.WolfgangMozart ⇒ Vienna
PlaceOfMarriage.WolfgangMozart ⇒ Vienna
Vienna
Where did Hogarth tupress?
PlaceOfBirth.WilliamHogarth ⇒ London
PlaceOfDeath.WilliamHogarth ⇒ London
PlaceOfMarriage.WilliamHogarth ⇒ Paddington
London
5
Training intuition
Where did Mozart tupress?
PlaceOfBirth.WolfgangMozart ⇒ Salzburg
PlaceOfDeath.WolfgangMozart ⇒ Vienna
PlaceOfMarriage.WolfgangMozart ⇒ Vienna
Vienna
Where did Hogarth tupress?
PlaceOfBirth.WilliamHogarth ⇒ London
PlaceOfDeath.WilliamHogarth ⇒ London
PlaceOfMarriage.WilliamHogarth ⇒ Paddington
London
5
Training intuition
Where did Mozart tupress?
PlaceOfBirth.WolfgangMozart ⇒ Salzburg
PlaceOfDeath.WolfgangMozart ⇒ Vienna
PlaceOfMarriage.WolfgangMozart ⇒ Vienna
Vienna
Where did Hogarth tupress?
PlaceOfBirth.WilliamHogarth ⇒ London
PlaceOfDeath.WilliamHogarth ⇒ London
PlaceOfMarriage.WilliamHogarth ⇒ Paddington
London
5
Outline
• Learning
– Overview
– Details
– Example
– Lexicon learning
– Supervision signals
6
Constructing derivations
Type.PersonuPlaceLived.Chicago
Type.Person
people
who PlaceLived.Chicago
PlaceLived
lived in
Chicago
Chicago
lexicon
lexiconlexicon
join
intersect
7
Many possible derivations!
x = people who have lived in Chicago
?
set of candidate derivations D(x)
Type.PersonuPlaceLived.Chicago
Type.Person
people
who PlaceLived.Chicago
PlaceLived
lived in
Chicago
Chicago
lexicon
lexiconlexicon
join
intersectType.OrguPresentIn.ChicagoMusical
Type.Org
people
who PresentIn.ChicagoMusical
PresentIn
lived in
ChicagoMusical
Chicago
lexicon
lexiconlexicon
join
intersect
8
x: utterance
d: derivation
Type.PersonuPlaceLived.Chicago
Type.Person
people
who PlaceLived.Chicago
PlaceLived
lived in
Chicago
Chicago
lexicon
lexiconlexicon
join
intersect
Feature vector and parameters in RF :
φ(x, d) θ ⇐learned
apply join 1 1.2
apply intersect 1 0.6
apply lexicon 3 2.1
lived maps to PlacesLived 1 3.1
lived maps to PlaceOfBirth 0 -0.4
born maps to PlaceOfBirth 0 2.7
... ... ...
9
x: utterance
d: derivation
Type.PersonuPlaceLived.Chicago
Type.Person
people
who PlaceLived.Chicago
PlaceLived
lived in
Chicago
Chicago
lexicon
lexiconlexicon
join
intersect
Feature vector and parameters in RF :φ(x, d) θ ⇐learned
apply join 1 1.2
apply intersect 1 0.6
apply lexicon 3 2.1
lived maps to PlacesLived 1 3.1
lived maps to PlaceOfBirth 0 -0.4
born maps to PlaceOfBirth 0 2.7
... ... ...
Scoreθ(x, d) = φ(x, d)>θ =
1.2 · 1 + 0.6 · 1 + 2.1 · 3 + 3.1 · 1 +−0.4 · 0 + 2.7 · 0 + . . .9
Deep learning alert!
The feature vector φ(x, d) is constructed by hand
10
Deep learning alert!
The feature vector φ(x, d) is constructed by hand
Constructing good features is hard
10
Deep learning alert!
The feature vector φ(x, d) is constructed by hand
Constructing good features is hard
Algorithms are likely to do it better
10
Deep learning alert!
The feature vector φ(x, d) is constructed by hand
Constructing good features is hard
Algorithms are likely to do it better
Perhaps we can train φ(x, d)
φ(x, d) = Fψ(x, d), where ψ are the parameters
10
Log-linear model
Candidate derivations: D(x)
Model: distribution over derivations d given utterance x
pθ(d | x) = exp(Scoreθ(x,d))∑d′∈D(x) exp(Scoreθ(x,d
′))
11
Log-linear model
Candidate derivations: D(x)
Model: distribution over derivations d given utterance x
pθ(d | x) = exp(Scoreθ(x,d))∑d′∈D(x) exp(Scoreθ(x,d
′))
scoreθ(x, d) [1, 2, 3, 4]
pθ(d | x) [ ee+e2+e3+e4 ,
e2
e+e2+e3+e4 ,e3
e+e2+e3+e4 ,e4
e+e2+e3+e4 ]
11
Log-linear model
Candidate derivations: D(x)
Model: distribution over derivations d given utterance x
pθ(d | x) = exp(Scoreθ(x,d))∑d′∈D(x) exp(Scoreθ(x,d
′))
scoreθ(x, d) [1, 2, 3, 4]
pθ(d | x) [ ee+e2+e3+e4 ,
e2
e+e2+e3+e4 ,e3
e+e2+e3+e4 ,e4
e+e2+e3+e4 ]
Parsing: find the top-K derivation trees Dθ(x)
11
Features
Dense features:
• intersection=0.67
• ent-popularity:HIGH
• denoation-size:1
12
Features
Dense features:
• intersection=0.67
• ent-popularity:HIGH
• denoation-size:1
Sparse features:
• bridge-binary:STUDY
• born:PlaceOfBirth
• city:Type.Location
12
Features
Dense features:
• intersection=0.67
• ent-popularity:HIGH
• denoation-size:1
Sparse features:
• bridge-binary:STUDY
• born:PlaceOfBirth
• city:Type.Location
Syntactic features:
• ent-pos:NNP NNP
• join-pos:V NN
• skip-pos:IN
12
Features
Dense features:
• intersection=0.67
• ent-popularity:HIGH
• denoation-size:1
Sparse features:
• bridge-binary:STUDY
• born:PlaceOfBirth
• city:Type.Location
Syntactic features:
• ent-pos:NNP NNP
• join-pos:V NN
• skip-pos:IN
Grammar features:
• Binary->Verb12
Learning θ: maximum-likelihood
Training data:
What’s Bulgaria’s capital?
Sofia
What movies has Tom Cruise been in?
TopGun,VanillaSky,...
...
What’s Bulgaria’s capital?
CapitalOf.Bulgaria
What movies has Tom Cruise been in?
Type.Movie u HasPlayed.TomCruise
...
13
Learning θ: maximum-likelihood
Training data:
What’s Bulgaria’s capital?
Sofia
What movies has Tom Cruise been in?
TopGun,VanillaSky,...
...
What’s Bulgaria’s capital?
CapitalOf.Bulgaria
What movies has Tom Cruise been in?
Type.Movie u HasPlayed.TomCruise
...
argmaxθ∑ni=1 log pθ(y
(i) | x(i)) =
argmaxθ∑ni=1 log
∑d(i) pθ(d
(i) | x(i))R(d(i))
13
Learning θ: maximum-likelihood
Training data:
What’s Bulgaria’s capital?
Sofia
What movies has Tom Cruise been in?
TopGun,VanillaSky,...
...
What’s Bulgaria’s capital?
CapitalOf.Bulgaria
What movies has Tom Cruise been in?
Type.Movie u HasPlayed.TomCruise
...
argmaxθ∑ni=1 log pθ(y
(i) | x(i)) =
argmaxθ∑ni=1 log
∑d(i) pθ(d
(i) | x(i))R(d(i))
R(d) =
{1 d.z = z(i)
0 o/wR(d) =
{1 [d.z]K = y(i)
0 o/wR(d) = F1([d.z]K, y
(i))
13
Optimization: stochastic gradient descent
For every example:
O(θ) = log∑d pθ(d | x)R(d)
∇O(θ) = Eqθ(d|x)[φ(x, d)]− Epθ(d|x)[φ(x, d)]pθ(d | x) ∝ exp(φ(x, d)>θ)
qθ(d | x) ∝ exp(φ(x, d)>θ) ·R(d)
14
Optimization: stochastic gradient descent
For every example:
O(θ) = log∑d pθ(d | x)R(d)
∇O(θ) = Eqθ(d|x)[φ(x, d)]− Epθ(d|x)[φ(x, d)]pθ(d | x) ∝ exp(φ(x, d)>θ)
qθ(d | x) ∝ exp(φ(x, d)>θ) ·R(d)
pθ(D(x)) = [0.2, 0.1, 0.1, 0.6]
R(D(x)) = [1, 0, 0, 1]
14
Optimization: stochastic gradient descent
For every example:
O(θ) = log∑d pθ(d | x)R(d)
∇O(θ) = Eqθ(d|x)[φ(x, d)]− Epθ(d|x)[φ(x, d)]pθ(d | x) ∝ exp(φ(x, d)>θ)
qθ(d | x) ∝ exp(φ(x, d)>θ) ·R(d)
pθ(D(x)) = [0.2, 0.1, 0.1, 0.6]
R(D(x)) = [1, 0, 0, 1]
qθ(D(x)) = [0.25, 0, 0, 0.75]
qθ =pθp>θ R
14
Optimization: stochastic gradient descent
For every example:
O(θ) = log∑d pθ(d | x)R(d)
∇O(θ) = Eqθ(d|x)[φ(x, d)]− Epθ(d|x)[φ(x, d)]pθ(d | x) ∝ exp(φ(x, d)>θ)
qθ(d | x) ∝ exp(φ(x, d)>θ) ·R(d)
pθ(D(x)) = [0.2, 0.1, 0.1, 0.6]
R(D(x)) = [1, 0, 0, 1]
qθ(D(x)) = [0.25, 0, 0, 0.75]
qθ =pθp>θ R
Gradient:
0.05 · φ(x, d1)− 0.1 · φ(x, d2)− 0.1 · φ(x, d3) + 0.15 · φ(x, d4)
14
Training
Input: {xi, yi}ni=1
Output: θ
15
Training
Input: {xi, yi}ni=1
Output: θ
θ ← 0
15
Training
Input: {xi, yi}ni=1
Output: θ
θ ← 0
for iteration τ and example i
D(xi)← argmaxK(pθ(d | xi))
15
Training
Input: {xi, yi}ni=1
Output: θ
θ ← 0
for iteration τ and example i
D(xi)← argmaxK(pθ(d | xi))
θ ← θ + ητ,i(Eqθ(d|xi)[φ(xi, d)]− Epθ(d|xi)[φ(xi, d)])
15
Training
Input: {xi, yi}ni=1
Output: θ
θ ← 0
for iteration τ and example i
D(xi)← argmaxK(pθ(d | xi))
θ ← θ + ητ,i(Eqθ(d|xi)[φ(xi, d)]− Epθ(d|xi)[φ(xi, d)])
ητ,i: learning rate
Regularization often added (L2, L1, ...)
15
Training (structured perceptron)
Input: {xi, yi}ni=1
Output: θ
16
Training (structured perceptron)
Input: {xi, yi}ni=1
Output: θ
θ ← 0
16
Training (structured perceptron)
Input: {xi, yi}ni=1
Output: θ
θ ← 0
for iteration τ and example i
d̂← argmax(pθ(d | xi))
d∗ ← argmax(qθ(d | xi))
16
Training (structured perceptron)
Input: {xi, yi}ni=1
Output: θ
θ ← 0
for iteration τ and example i
d̂← argmax(pθ(d | xi))
d∗ ← argmax(qθ(d | xi))
if [d∗]K 6= [d̂]K
θ ← θ + φ(xi, d∗)− φ(xi, d̂))
16
Training (structured perceptron)
Input: {xi, yi}ni=1
Output: θ
θ ← 0
for iteration τ and example i
d̂← argmax(pθ(d | xi))
d∗ ← argmax(qθ(d | xi))
if [d∗]K 6= [d̂]K
θ ← θ + φ(xi, d∗)− φ(xi, d̂))
Regularization often added with weight averaging
16
Training
Other simple variants exist:
• E.g., cost-sensitive max-margin training
• That is, find pairs of good and bad derivations that look differentbut have similar scores and update on those
17
Outline
• Learning
– Overview
– Details
– Example
– Lexicon learning
– Supervision signals
18
Example
./run @mode=simple-lambdadcs \
-Grammar.inPaths esslli 2016/class3 demo.grammar \
-SimpleLexicon.inPaths esslli 2016/class3 demo.lexicon
(loadgraph geo880/geo880.kg)
size of california
size capital california
size of capital of california
california size
19
Exercise
Find a pair of natural language utterances that cannot be distinguishedusing the current feature representation
• The utterances can be not fully grammatical in English
• You can ignore the denotation feature if that helps
• Verify this in sempre (ask me how to disable features)
• Design a feature that will solve this problem
20
Outline
• Learning
– Overview
– Details
– Example
– Lexicon learning
– Supervision signals
21
The lexicon problem
How is the lexicon generated?
• Annotation
• Exhaustive search
• String matching
• Supervised alignment
• Unsupervised alignment
• Learning
22
Training
Input: {xi, yi}ni=1
Output: θ
θ ← 0
for iteration τ and example i
Add lexicon entries
D(xi)← argmaxK(pθ(d | xi)
θ ← θ + ητ,i(Eqθ(d|xi)[φ(xi, d)]− Epθ(d|xi)[φ(xi, d)])
ητ,i: learning rate
Regularization often added (L2, L1, ...)
23
Adding lexicon entries
Input: training example (xi, yi), current lexicon Λ, model θ
Λtemp ← Λ ∪ GENLEX(xi, yi) Create expanded temporaty lexicon
[Adapted from semantic parsing tutorial, Artzi et al.]
24
Adding lexicon entries
Input: training example (xi, yi), current lexicon Λ, model θ
Λtemp ← Λ ∪ GENLEX(xi, yi) Create expanded temporaty lexicon
D(xi)← arg maxK(pθ,Λtemp(d | xi)) Parse with temporary lexicon
[Adapted from semantic parsing tutorial, Artzi et al.]
24
Adding lexicon entries
Input: training example (xi, yi), current lexicon Λ, model θ
Λtemp ← Λ ∪ GENLEX(xi, yi) Create expanded temporaty lexicon
D(xi)← arg maxK(pθ,Λtemp(d | xi)) Parse with temporary lexicon
Λ = Λ ∪ {l | l ∈ d̂, d̂ ∈ D(xi), R(d) = 1} Add entries from correct trees
[Adapted from semantic parsing tutorial, Artzi et al.]
24
Adding lexicon entries
Input: training example (xi, yi), current lexicon Λ, model θ
Λtemp ← Λ ∪ GENLEX(xi, yi) Create expanded temporaty lexicon
D(xi)← arg maxK(pθ,Λtemp(d | xi)) Parse with temporary lexicon
Λ = Λ ∪ {l | l ∈ d̂, d̂ ∈ D(xi), R(d) = 1} Add entries from correct trees
Overgenerate lexical entries and add promising ones
[Adapted from semantic parsing tutorial, Artzi et al.]
24
Lexicon generation
Logical form supervision:
Largest state bordering California
argmax(Type(State) u Border(California),Area)
[Zettlemoyer and Collins, 2005]
25
Lexicon generation
Logical form supervision:
Largest state bordering California
argmax(Type(State) u Border(California),Area)
Enumerate spans
Largest
state
bordering
California
Largest state
. . .
Use rules to extract sub-forumulas
California
Border(California)
λf.f(California)
Area
λx.argmax(x,Area)
. . .
[Zettlemoyer and Collins, 2005]
25
Lexicon generation
Logical form supervision:
Largest state bordering California
argmax(Type(State) u Border(California),Area)
Enumerate spans
Largest
state
bordering
California
Largest state
. . .
Use rules to extract sub-forumulas
California
Border(California)
λf.f(California)
Area
λx.argmax(x,Area)
. . .
Add cross-product to lexicon
[Zettlemoyer and Collins, 2005]
25
Lexicon generation
Denotation supervision:
Largest state bordering California
Arizona
26
Lexicon generation
Denotation supervision:
Largest state bordering California
Arizona
Enumerate spans
Largest
state
bordering
California
Largest state
. . .
Generate sub-formulas from KB
California
Border(California)
Traverse
Type.Mountain
λx.argmax(x,Elevation)
. . .
Restrict candidates with alignment, string matching, ...
26
Lexicon generation
Denotation supervision:
Largest state bordering California
Arizona
Enumerate spans
Largest
state
bordering
California
Largest state
. . .
Generate sub-formulas from KB
California
Border(California)
Traverse
Type.Mountain
λx.argmax(x,Elevation)
. . .
Restrict candidates with alignment, string matching, ...
Fancier methods exist (coarse-to-fine)
26
Unification
Logical form supervision:
[Kwiatkowski et al, 2010]
27
Unification
Logical form supervision:
Initialize lexicon with (xi, zi):
States bordering California
Type(State) u Border(California)
Split lexical entry in all possible ways:
[Kwiatkowski et al, 2010]
27
Unification
Logical form supervision:
Initialize lexicon with (xi, zi):
States bordering California
Type(State) u Border(California)
Split lexical entry in all possible ways:
Enumerate spans
(states, bordering california)
(states bordering, california)
Generate sub-formulas from KB
(Type(State), λx.x u Border(California))
(λx.Type(State) u x,Border(California))
(λf.Type(State) u f(California),California)
. . .
[Kwiatkowski et al, 2010]
27
Unification
For example xi, zi:
[Kwiatkowski et al, 2010]
28
Unification
For example xi, zi:
Find highest scoring correct parse d∗
[Kwiatkowski et al, 2010]
28
Unification
For example xi, zi:
Find highest scoring correct parse d∗
Split all lexical entries in d∗ in all possible ways
[Kwiatkowski et al, 2010]
28
Unification
For example xi, zi:
Find highest scoring correct parse d∗
Split all lexical entries in d∗ in all possible ways
Add to lexicon lexical entry that improves parse score best
[Kwiatkowski et al, 2010]
28
Do we need a lexicon?
Type(State) u Border(California)
Type(State)
Type State
Border(California)
Border California
California neighbors
29
Do we need a lexicon?
Type(State) u Border(California)
Type(State)
Type State
Border(California)
Border California
California neighbors
Floating parse tree: generalization of bridging
29
Do we need a lexicon?
Type(State) u Border(California)
Type(State)
Type State
Border(California)
Border California
California neighbors
Floating parse tree: generalization of bridging
Perhaps with better learning and search not necessary?
29
Outline
• Learning
– Overview
– Details
– Example
– Lexicon learning
– Supervision signals
30
Supervision signals
We discussed training from logical forms and denotations
31
Supervision signals
We discussed training from logical forms and denotations
Other forms of supervision have been proposed:
• Demonstrations
• Distant supervision
• Conversations
• Unsupervised
• Paraphrasing
31
Training from demonstrations
Input: (xi, si, ti)
xi: utterance
si: start state
ti: end state
move forward until you reach the intersection
[Artzi and Zettlemoyer, 2013]
32
Training from demonstrations
Input: (xi, si, ti)
xi: utterance
si: start state
ti: end state
move forward until you reach the intersection
λa.move(a) ∧ dir(a.forward) . . .
[Artzi and Zettlemoyer, 2013]
32
Training from demonstrations
Input: (xi, si, ti)
xi: utterance
si: start state
ti: end state
move forward until you reach the intersection
λa.move(a) ∧ dir(a.forward) . . .
An instance of learning from denotations
[Artzi and Zettlemoyer, 2013]
32
Distant supervision
Data generation:
• Decompose declarative text to questions and answers
James Cameron is the director of Titanic
Q: X is the director of Titanic A: James Cameron
[Reddy et al., 2014]
33
Distant supervision
Data generation:
• Decompose declarative text to questions and answers
James Cameron is the director of Titanic
Q: X is the director of Titanic A: James Cameron
Declarative text is cheap!
[Reddy et al., 2014]
33
Distant supervision
Training:
• Use exising non-executable semantic parsers
X is the director of Titanic
[Reddy et al., 2014]
34
Distant supervision
Training:
• Use exising non-executable semantic parsers
X is the director of Titanic
λx.director(x) ∧ director.of.arg1(e, x) ∧ director.of.arg2(e,Titanic)
[Reddy et al., 2014]
34
Distant supervision
Training:
• Use exising non-executable semantic parsers
X is the director of Titanic
λx.director(x) ∧ director.of.arg1(e, x) ∧ director.of.arg2(e,Titanic)
λx.Director(x) ∧ FilmDirectedBy(e, x) ∧ FileDirected(e,Titanic)
λx.Producer(x) ∧ FilmProducedBy(e, x) ∧ FileProduced(e,Titanic)
[Reddy et al., 2014]
34
Distant supervision
Training:
• Use exising non-executable semantic parsers
X is the director of Titanic
λx.director(x) ∧ director.of.arg1(e, x) ∧ director.of.arg2(e,Titanic)
λx.Director(x) ∧ FilmDirectedBy(e, x) ∧ FileDirected(e,Titanic)
λx.Producer(x) ∧ FilmProducedBy(e, x) ∧ FileProduced(e,Titanic)
James Cameron true
James Camerson, Jon Landau false
[Reddy et al., 2014]
34
Training from conversations
35
Training from conversations
z1 : From(Atlanta) u To(London)
z2 : From(Atlanta) u From(London)
z3 : To(Atlanta) u To(London)
z4 : To(Atlanta) u From(London)
35
Training from conversations
z1 : From(Atlanta) u To(London)
z2 : From(Atlanta) u From(London)
z3 : To(Atlanta) u To(London)
z4 : To(Atlanta) u From(London)
Define loss:
• Does z align with converation?
• Does z obey domain constraints?
35
Unsupervised learning
Intuition: assume repeating patterns are correct
[Goldwasser et al., 2011]
36
Unsupervised learning
Intuition: assume repeating patterns are correct
Input: {xi}ni=1
Output: θ
[Goldwasser et al., 2011]
36
Unsupervised learning
Intuition: assume repeating patterns are correct
Input: {xi}ni=1
Output: θ
θ initialized manually, S = φ
[Goldwasser et al., 2011]
36
Unsupervised learning
Intuition: assume repeating patterns are correct
Input: {xi}ni=1
Output: θ
θ initialized manually, S = φ
Until stopping criterion met
for example xi
S = S ∪ (xi, argmax pθ(d | xi))
[Goldwasser et al., 2011]
36
Unsupervised learning
Intuition: assume repeating patterns are correct
Input: {xi}ni=1
Output: θ
θ initialized manually, S = φ
Until stopping criterion met
for example xi
S = S ∪ (xi, argmax pθ(d | xi))Compute statistics of S
[Goldwasser et al., 2011]
36
Unsupervised learning
Intuition: assume repeating patterns are correct
Input: {xi}ni=1
Output: θ
θ initialized manually, S = φ
Until stopping criterion met
for example xi
S = S ∪ (xi, argmax pθ(d | xi))Compute statistics of S
Sconf ←find confident subset
[Goldwasser et al., 2011]
36
Unsupervised learning
Intuition: assume repeating patterns are correct
Input: {xi}ni=1
Output: θ
θ initialized manually, S = φ
Until stopping criterion met
for example xi
S = S ∪ (xi, argmax pθ(d | xi))Compute statistics of S
Sconf ←find confident subset
Train on Sconf
[Goldwasser et al., 2011]
36
Unsupervised learning
Intuition: assume repeating patterns are correct
Input: {xi}ni=1
Output: θ
θ initialized manually, S = φ
Until stopping criterion met
for example xi
S = S ∪ (xi, argmax pθ(d | xi))Compute statistics of S
Sconf ←find confident subset
Train on Sconf
Substantially lower performance
[Goldwasser et al., 2011]
36
Paraphrasing
What languages do people in Brazil use?
B. and Liang, 2014
37
Paraphrasing
What languages do people in Brazil use?
Type.HumanLanguage u LanguagesSpoken.Brazil ... CapitalOf.Brazil
B. and Liang, 2014
37
Paraphrasing
What languages do people in Brazil use?
What language is the language of Brazil? ... What city is the capital of Brazil?
Type.HumanLanguage u LanguagesSpoken.Brazil ... CapitalOf.Brazil
B. and Liang, 2014
37
Paraphrasing
What languages do people in Brazil use?
paraphrase model
What language is the language of Brazil? ... What city is the capital of Brazil?
Type.HumanLanguage u LanguagesSpoken.Brazil ... CapitalOf.Brazil
B. and Liang, 2014
37
Paraphrasing
What languages do people in Brazil use?
paraphrase model
What language is the language of Brazil? ... What city is the capital of Brazil?
Type.HumanLanguage u LanguagesSpoken.Brazil ... CapitalOf.Brazil
Portuguese, ...
B. and Liang, 2014
37
Paraphrasing
What languages do people in Brazil use?
paraphrase model
What language is the language of Brazil? ... What city is the capital of Brazil?
Type.HumanLanguage u LanguagesSpoken.Brazil ... CapitalOf.Brazil
Portuguese, ...
Model: pθ(c, z | x) = p(z | x)× pθ(c | x)
Idea: train a large paraphrase model pθ(c | x)
B. and Liang, 2014
37
Paraphrasing
What languages do people in Brazil use?
paraphrase model
What language is the language of Brazil? ... What city is the capital of Brazil?
Type.HumanLanguage u LanguagesSpoken.Brazil ... CapitalOf.Brazil
Portuguese, ...
Model: pθ(c, z | x) = p(z | x)× pθ(c | x)
Idea: train a large paraphrase model pθ(c | x)
More later
B. and Liang, 2014
37
Summary
We saw how to train from denotations and logical forms
We see methods for inducing lexicons during training
We reviewed work on how to use even weaker forms of supervision
Still an open problem
38