learnability and semantic universals
TRANSCRIPT
Semantic Universals The Learning Model Experiments Conclusion
Learnability and Semantic Universals
Shane Steinert-Threlkeld(joint work-in-progress with Jakub Szymanik)
Cognitive Semantics and Quantities Kick-off WorkshopSeptember 28, 2017
1 / 38
Semantic Universals The Learning Model Experiments Conclusion
Overview
1 Semantic Universals
2 The Learning Model
3 Experiments
4 Conclusion
2 / 38
Semantic Universals The Learning Model Experiments Conclusion
Universals in Linguistic Theory
Question
What is the range of variation in human languages? That is: which outof all of the logically possible languages that humans could speak, dothey in fact speak?
3 / 38
Semantic Universals The Learning Model Experiments Conclusion
Example Universals
Sound (phonology):All languages have consonants and vowels. Every language has atleast /a i u/.
Grammar (syntax):All languages have verbs and nouns.
Meaning (semantics):All languages have syntactic constituents (NPs) whose semanticfunction is to express generalized quantifiers. (Barwise and Cooper1981)
4 / 38
Semantic Universals The Learning Model Experiments Conclusion
Example Universals
Sound (phonology):All languages have consonants and vowels. Every language has atleast /a i u/.
Grammar (syntax):All languages have verbs and nouns.
Meaning (semantics):All languages have syntactic constituents (NPs) whose semanticfunction is to express generalized quantifiers. (Barwise and Cooper1981)
4 / 38
Semantic Universals The Learning Model Experiments Conclusion
Example Universals
Sound (phonology):All languages have consonants and vowels. Every language has atleast /a i u/.
Grammar (syntax):All languages have verbs and nouns.
Meaning (semantics):
All languages have syntactic constituents (NPs) whose semanticfunction is to express generalized quantifiers. (Barwise and Cooper1981)
4 / 38
Semantic Universals The Learning Model Experiments Conclusion
Example Universals
Sound (phonology):All languages have consonants and vowels. Every language has atleast /a i u/.
Grammar (syntax):All languages have verbs and nouns.
Meaning (semantics):All languages have syntactic constituents (NPs) whose semanticfunction is to express generalized quantifiers. (Barwise and Cooper1981)
4 / 38
Semantic Universals The Learning Model Experiments Conclusion
Determiners
Determiners:
Simple: every, some, few, most, five, . . .Complex: all but five, fewer than three, at least eight or fewer thanfive, . . .
Denote type 〈1, 1〉 generalized quantifiers: sets of models of theform 〈M,A,B〉 with A,B ⊆ M
For example:
JeveryK = {〈M,A,B〉 : A ⊆ B}JmostK = {〈M,A,B〉 : |A ∩ B| > |A \ B|}
5 / 38
Semantic Universals The Learning Model Experiments Conclusion
Determiners
Determiners:
Simple: every, some, few, most, five, . . .Complex: all but five, fewer than three, at least eight or fewer thanfive, . . .
Denote type 〈1, 1〉 generalized quantifiers: sets of models of theform 〈M,A,B〉 with A,B ⊆ M
For example:
JeveryK = {〈M,A,B〉 : A ⊆ B}JmostK = {〈M,A,B〉 : |A ∩ B| > |A \ B|}
5 / 38
Semantic Universals The Learning Model Experiments Conclusion
Determiners
Determiners:
Simple: every, some, few, most, five, . . .Complex: all but five, fewer than three, at least eight or fewer thanfive, . . .
Denote type 〈1, 1〉 generalized quantifiers: sets of models of theform 〈M,A,B〉 with A,B ⊆ M
For example:
JeveryK = {〈M,A,B〉 : A ⊆ B}JmostK = {〈M,A,B〉 : |A ∩ B| > |A \ B|}
5 / 38
Semantic Universals The Learning Model Experiments Conclusion
Monotonicity
Many French people smoke cigarettes⇒ Many French people smoke
Few French people smoke⇒ Few French people smoke cigarettes
Q is upward (resp. downward) monotone iff:if 〈M,A,B〉 ∈ Q and B ⊆ B ′ (resp. B ′ ⊆ B), then 〈M,A,B〉 ∈ Q
A determiner is monotone if it denotes either an upward ordownward monotone generalized quantifier
6 / 38
Semantic Universals The Learning Model Experiments Conclusion
Monotonicity
Many French people smoke cigarettes⇒ Many French people smoke
Few French people smoke⇒ Few French people smoke cigarettes
Q is upward (resp. downward) monotone iff:if 〈M,A,B〉 ∈ Q and B ⊆ B ′ (resp. B ′ ⊆ B), then 〈M,A,B〉 ∈ Q
A determiner is monotone if it denotes either an upward ordownward monotone generalized quantifier
6 / 38
Semantic Universals The Learning Model Experiments Conclusion
Monotonicity
Many French people smoke cigarettes⇒ Many French people smoke
Few French people smoke⇒ Few French people smoke cigarettes
Q is upward (resp. downward) monotone iff:if 〈M,A,B〉 ∈ Q and B ⊆ B ′ (resp. B ′ ⊆ B), then 〈M,A,B〉 ∈ Q
A determiner is monotone if it denotes either an upward ordownward monotone generalized quantifier
6 / 38
Semantic Universals The Learning Model Experiments Conclusion
Monotonicity Universal
Monotonicity Universal (Barwise and Cooper 1981)
All simple determiners are monotone.
7 / 38
Semantic Universals The Learning Model Experiments Conclusion
Quantity
Key intuition: determiners denote general relations between theirrestrictor and nuclear scope, not ones that depend on the identitiesof particular elements of the domain
Q is permutation-closed iff:if 〈M,A,B〉 ∈ Q and π is a permutation of M, thenπ(〈M,A,B〉) ∈ Q
Q is isomorphism-closed iff:if 〈M,A,B〉 ∈ Q and 〈M,A,B〉 ∼= 〈M ′,A′,B ′〉, then〈M ′,A′,B ′〉 ∈ Q
Q is quantitative:if 〈M,A,B〉 ∈ Q and A∩B,A \B,B \A,M \ (A∪B) have the samecardinality as their primed-counterparts, then 〈M ′,A′,B ′〉 ∈ Q
8 / 38
Semantic Universals The Learning Model Experiments Conclusion
Quantity
Key intuition: determiners denote general relations between theirrestrictor and nuclear scope, not ones that depend on the identitiesof particular elements of the domain
Q is permutation-closed iff:if 〈M,A,B〉 ∈ Q and π is a permutation of M, thenπ(〈M,A,B〉) ∈ Q
Q is isomorphism-closed iff:if 〈M,A,B〉 ∈ Q and 〈M,A,B〉 ∼= 〈M ′,A′,B ′〉, then〈M ′,A′,B ′〉 ∈ Q
Q is quantitative:if 〈M,A,B〉 ∈ Q and A∩B,A \B,B \A,M \ (A∪B) have the samecardinality as their primed-counterparts, then 〈M ′,A′,B ′〉 ∈ Q
8 / 38
Semantic Universals The Learning Model Experiments Conclusion
Quantity
Key intuition: determiners denote general relations between theirrestrictor and nuclear scope, not ones that depend on the identitiesof particular elements of the domain
Q is permutation-closed iff:if 〈M,A,B〉 ∈ Q and π is a permutation of M, thenπ(〈M,A,B〉) ∈ Q
Q is isomorphism-closed iff:if 〈M,A,B〉 ∈ Q and 〈M,A,B〉 ∼= 〈M ′,A′,B ′〉, then〈M ′,A′,B ′〉 ∈ Q
Q is quantitative:if 〈M,A,B〉 ∈ Q and A∩B,A \B,B \A,M \ (A∪B) have the samecardinality as their primed-counterparts, then 〈M ′,A′,B ′〉 ∈ Q
8 / 38
Semantic Universals The Learning Model Experiments Conclusion
Quantity
Key intuition: determiners denote general relations between theirrestrictor and nuclear scope, not ones that depend on the identitiesof particular elements of the domain
Q is permutation-closed iff:if 〈M,A,B〉 ∈ Q and π is a permutation of M, thenπ(〈M,A,B〉) ∈ Q
Q is isomorphism-closed iff:if 〈M,A,B〉 ∈ Q and 〈M,A,B〉 ∼= 〈M ′,A′,B ′〉, then〈M ′,A′,B ′〉 ∈ Q
Q is quantitative:if 〈M,A,B〉 ∈ Q and A∩B,A \B,B \A,M \ (A∪B) have the samecardinality as their primed-counterparts, then 〈M ′,A′,B ′〉 ∈ Q
8 / 38
Semantic Universals The Learning Model Experiments Conclusion
Quantity
Keenan and Stavi 1986, p. 311:“Monomorphemic dets are logical [= permutation-closed].”
Peters and Westerstahl 2006, p. 330:“All lexical quantifier expressions in natural languages denote ISOM[= isomorphism-closed] quantifiers.”
9 / 38
Semantic Universals The Learning Model Experiments Conclusion
Quantity
Keenan and Stavi 1986, p. 311:“Monomorphemic dets are logical [= permutation-closed].”
Peters and Westerstahl 2006, p. 330:“All lexical quantifier expressions in natural languages denote ISOM[= isomorphism-closed] quantifiers.”
9 / 38
Semantic Universals The Learning Model Experiments Conclusion
Quantity Universal
Quantity Universal
All simple determiners are quantitative.
10 / 38
Semantic Universals The Learning Model Experiments Conclusion
Conservativity
Key intuition: the restrictor restricts what the determiner talks about
Many French people smoke cigarettes≡ Many French people are French people who smoke cigarettes
Q is conservative iff:〈M,A,B〉 ∈ Q if and only if 〈M,A,A ∩ B〉 ∈ Q
11 / 38
Semantic Universals The Learning Model Experiments Conclusion
Conservativity
Key intuition: the restrictor restricts what the determiner talks about
Many French people smoke cigarettes≡ Many French people are French people who smoke cigarettes
Q is conservative iff:〈M,A,B〉 ∈ Q if and only if 〈M,A,A ∩ B〉 ∈ Q
11 / 38
Semantic Universals The Learning Model Experiments Conclusion
Conservativity
Key intuition: the restrictor restricts what the determiner talks about
Many French people smoke cigarettes≡ Many French people are French people who smoke cigarettes
Q is conservative iff:〈M,A,B〉 ∈ Q if and only if 〈M,A,A ∩ B〉 ∈ Q
11 / 38
Semantic Universals The Learning Model Experiments Conclusion
Conservativity Universal
Conservativity Universal (Barwise and Cooper 1981)
All simple determiners are conservative.
12 / 38
Semantic Universals The Learning Model Experiments Conclusion
Explaining Universals
Natural Question
Why do the universals hold? What explains the limited range ofquantifiers expressed by simple determiners?
13 / 38
Semantic Universals The Learning Model Experiments Conclusion
Explaining Universals
Natural Question
Why do the universals hold? What explains the limited range ofquantifiers expressed by simple determiners?
Answer 1: learnability.(Barwise and Cooper 1981; Keenan and Stavi 1986; Szabolcsi 2010)
The universals greatly restrict the search space that a languagelearner must explore when learning the meanings of determiners.This makes it easier (possible?) for them to learn such meaningsfrom relatively small input.[Compare: Poverty of the Stimulus argument for UG.]
In a sense must be true, but:“Likely, the unrestricted space has many hypotheses which are soimplausible, they can be ignored quickly and do not affect learning.The hard part of learning, may be choosing between the plausiblecompetitor meanings, not in weeding out a large space of potentialmeanings.” Piantadosi 2013, p. 22
13 / 38
Semantic Universals The Learning Model Experiments Conclusion
Explaining Universals
Natural Question
Why do the universals hold? What explains the limited range ofquantifiers expressed by simple determiners?
Answer 1: learnability.(Barwise and Cooper 1981; Keenan and Stavi 1986; Szabolcsi 2010)
The universals greatly restrict the search space that a languagelearner must explore when learning the meanings of determiners.This makes it easier (possible?) for them to learn such meaningsfrom relatively small input.[Compare: Poverty of the Stimulus argument for UG.]
In a sense must be true, but:“Likely, the unrestricted space has many hypotheses which are soimplausible, they can be ignored quickly and do not affect learning.The hard part of learning, may be choosing between the plausiblecompetitor meanings, not in weeding out a large space of potentialmeanings.” Piantadosi 2013, p. 22
13 / 38
Semantic Universals The Learning Model Experiments Conclusion
Explaining Universals
Natural Question
Why do the universals hold? What explains the limited range ofquantifiers expressed by simple determiners?
Answer 1: learnability.(Barwise and Cooper 1981; Keenan and Stavi 1986; Szabolcsi 2010)
The universals greatly restrict the search space that a languagelearner must explore when learning the meanings of determiners.This makes it easier (possible?) for them to learn such meaningsfrom relatively small input.[Compare: Poverty of the Stimulus argument for UG.]
In a sense must be true, but:“Likely, the unrestricted space has many hypotheses which are soimplausible, they can be ignored quickly and do not affect learning.The hard part of learning, may be choosing between the plausiblecompetitor meanings, not in weeding out a large space of potentialmeanings.” Piantadosi 2013, p. 22
13 / 38
Semantic Universals The Learning Model Experiments Conclusion
Explaining Universals
Natural Question
Why do the universals hold? What explains the limited range ofquantifiers expressed by simple determiners?
Answer 2: learnability.(Peters and Westerstahl 2006)
The universals aid learnability because quantifiers satisfying theuniversals are easier to learn than those that do not.
Challenge: provide a model of learning which makes good on thispromise.[See Tiede 1999; Gierasimczuk 2007; Gierasimczuk 2009 for earlierattempts.]
13 / 38
Semantic Universals The Learning Model Experiments Conclusion
Explaining Universals
Natural Question
Why do the universals hold? What explains the limited range ofquantifiers expressed by simple determiners?
Answer 2: learnability.(Peters and Westerstahl 2006)
The universals aid learnability because quantifiers satisfying theuniversals are easier to learn than those that do not.
Challenge: provide a model of learning which makes good on thispromise.[See Tiede 1999; Gierasimczuk 2007; Gierasimczuk 2009 for earlierattempts.]
13 / 38
Semantic Universals The Learning Model Experiments Conclusion
Explaining Universals
Natural Question
Why do the universals hold? What explains the limited range ofquantifiers expressed by simple determiners?
Answer 2: learnability.(Peters and Westerstahl 2006)
The universals aid learnability because quantifiers satisfying theuniversals are easier to learn than those that do not.
Challenge: provide a model of learning which makes good on thispromise.[See Tiede 1999; Gierasimczuk 2007; Gierasimczuk 2009 for earlierattempts.]
13 / 38
Semantic Universals The Learning Model Experiments Conclusion
Overview
1 Semantic Universals
2 The Learning Model
3 Experiments
4 Conclusion
14 / 38
Semantic Universals The Learning Model Experiments Conclusion
Neural Networks
Source: Nielsen, “Neural Networks and Deep Learning”, Determination Press
15 / 38
Semantic Universals The Learning Model Experiments Conclusion
Gradient Descent and Back-propogation
The learning framework will be a non-convex optimization problem.
A total loss function, which will be the mean of a ‘local’ errorfunction:
L =1
N
∑`(yi , yi )
=1
N
∑`(NN(~θ, xi ), yi )
Navigate the landscape of weight space towards lower loss:
~θt+1 ← ~θt − α∇~θL
Back-propagation: calculate `(·, ·) after a forward-pass of thenetwork. ‘Propagate’ the error backwards through the network tocalculate the gradient.
16 / 38
Semantic Universals The Learning Model Experiments Conclusion
Gradient Descent and Back-propogation
The learning framework will be a non-convex optimization problem.
A total loss function, which will be the mean of a ‘local’ errorfunction:
L =1
N
∑`(yi , yi )
=1
N
∑`(NN(~θ, xi ), yi )
Navigate the landscape of weight space towards lower loss:
~θt+1 ← ~θt − α∇~θL
Back-propagation: calculate `(·, ·) after a forward-pass of thenetwork. ‘Propagate’ the error backwards through the network tocalculate the gradient.
16 / 38
Semantic Universals The Learning Model Experiments Conclusion
Gradient Descent and Back-propogation
The learning framework will be a non-convex optimization problem.
A total loss function, which will be the mean of a ‘local’ errorfunction:
L =1
N
∑`(yi , yi )
=1
N
∑`(NN(~θ, xi ), yi )
Navigate the landscape of weight space towards lower loss:
~θt+1 ← ~θt − α∇~θL
Back-propagation: calculate `(·, ·) after a forward-pass of thenetwork. ‘Propagate’ the error backwards through the network tocalculate the gradient.
16 / 38
Semantic Universals The Learning Model Experiments Conclusion
Recurrent Neural Networks
Source: Olah, “Understanding LSTM Networks”, http://colah.github.io/posts/2015-08-Understanding-LSTMs/
17 / 38
Semantic Universals The Learning Model Experiments Conclusion
Long Short-Term Memory
Introduced in Hochreiter and Schmidhuber 1997 to solve theexploding/vanishing gradients problem.
Source: Olah, “Understanding LSTM Networks”, http://colah.github.io/posts/2015-08-Understanding-LSTMs/
18 / 38
Semantic Universals The Learning Model Experiments Conclusion
The Learning Task
Our data will be the following:
Input: 〈Q,M〉 pairs, presented sequentiallyOutput: T/F, depending on whether M ∈ Q
So, this will be a sequence classification task.
The loss function we minimize is cross-entropy. In our case, withy ∈ {0, 1}, very simple:
`(NN(~θ, x), y) = − ln(NN(~θ, x)y )
19 / 38
Semantic Universals The Learning Model Experiments Conclusion
The Learning Task
Our data will be the following:
Input: 〈Q,M〉 pairs, presented sequentiallyOutput: T/F, depending on whether M ∈ Q
So, this will be a sequence classification task.
The loss function we minimize is cross-entropy. In our case, withy ∈ {0, 1}, very simple:
`(NN(~θ, x), y) = − ln(NN(~θ, x)y )
19 / 38
Semantic Universals The Learning Model Experiments Conclusion
The Learning Task
Our data will be the following:
Input: 〈Q,M〉 pairs, presented sequentiallyOutput: T/F, depending on whether M ∈ Q
So, this will be a sequence classification task.
The loss function we minimize is cross-entropy. In our case, withy ∈ {0, 1}, very simple:
`(NN(~θ, x), y) = − ln(NN(~θ, x)y )
19 / 38
Semantic Universals The Learning Model Experiments Conclusion
Data Generation
Algorithm 1 Data Generation Algorithm
Inputs: max len, num data, quants
data← []while len(data) < num data do
N ∼ Unif ([max len])Q ∼ Unif (quants)cur seq← Unif ({A ∩ B,A \ B,B \ A,M \ (A ∪ B)} ,N)if 〈Q, cur seq〉 /∈ data then
data.append(generate point(〈Q, cur seq,Q ∈ cur seq?〉))end if
end whilereturn shuffle(data)
20 / 38
Semantic Universals The Learning Model Experiments Conclusion
Overview
1 Semantic Universals
2 The Learning Model
3 Experiments
4 Conclusion
21 / 38
Semantic Universals The Learning Model Experiments Conclusion
General Set-up
For each proposed semantic universal:
Choose a set of minimally different quantifiers that disagree withrespect to the universal
Run some number of trials of training an LSTM to learn thosequantifiers
Stop when:500 minibatch running mean total accuracy on test set is > 0.99; ormean probability assigned to correct truth-value on test set is > 0.99
Measure, for each quantifier, its convergence point:first i such that mean accuracy from i onward for Q is > 0.98
Code and Data Available At:
http://github.com/shanest/quantifier-rnn-learning
22 / 38
Semantic Universals The Learning Model Experiments Conclusion
General Set-up
For each proposed semantic universal:
Choose a set of minimally different quantifiers that disagree withrespect to the universal
Run some number of trials of training an LSTM to learn thosequantifiers
Stop when:500 minibatch running mean total accuracy on test set is > 0.99; ormean probability assigned to correct truth-value on test set is > 0.99
Measure, for each quantifier, its convergence point:first i such that mean accuracy from i onward for Q is > 0.98
Code and Data Available At:
http://github.com/shanest/quantifier-rnn-learning
22 / 38
Semantic Universals The Learning Model Experiments Conclusion
General Set-up
For each proposed semantic universal:
Choose a set of minimally different quantifiers that disagree withrespect to the universal
Run some number of trials of training an LSTM to learn thosequantifiers
Stop when:500 minibatch running mean total accuracy on test set is > 0.99; ormean probability assigned to correct truth-value on test set is > 0.99
Measure, for each quantifier, its convergence point:first i such that mean accuracy from i onward for Q is > 0.98
Code and Data Available At:
http://github.com/shanest/quantifier-rnn-learning
22 / 38
Semantic Universals The Learning Model Experiments Conclusion
General Set-up
For each proposed semantic universal:
Choose a set of minimally different quantifiers that disagree withrespect to the universal
Run some number of trials of training an LSTM to learn thosequantifiers
Stop when:500 minibatch running mean total accuracy on test set is > 0.99; ormean probability assigned to correct truth-value on test set is > 0.99
Measure, for each quantifier, its convergence point:first i such that mean accuracy from i onward for Q is > 0.98
Code and Data Available At:
http://github.com/shanest/quantifier-rnn-learning
22 / 38
Semantic Universals The Learning Model Experiments Conclusion
General Set-up
For each proposed semantic universal:
Choose a set of minimally different quantifiers that disagree withrespect to the universal
Run some number of trials of training an LSTM to learn thosequantifiers
Stop when:500 minibatch running mean total accuracy on test set is > 0.99; ormean probability assigned to correct truth-value on test set is > 0.99
Measure, for each quantifier, its convergence point:first i such that mean accuracy from i onward for Q is > 0.98
Code and Data Available At:
http://github.com/shanest/quantifier-rnn-learning
22 / 38
Semantic Universals The Learning Model Experiments Conclusion
General Set-up
For each proposed semantic universal:
Choose a set of minimally different quantifiers that disagree withrespect to the universal
Run some number of trials of training an LSTM to learn thosequantifiers
Stop when:500 minibatch running mean total accuracy on test set is > 0.99; ormean probability assigned to correct truth-value on test set is > 0.99
Measure, for each quantifier, its convergence point:first i such that mean accuracy from i onward for Q is > 0.98
Code and Data Available At:
http://github.com/shanest/quantifier-rnn-learning
22 / 38
Semantic Universals The Learning Model Experiments Conclusion
Experiment 1: Monotonicity
Monotonicity Universal (Barwise and Cooper 1981)
All simple determiners are monotone.
Quantifiers: ≥ 4, ≤ 4, = 4
23 / 38
Semantic Universals The Learning Model Experiments Conclusion
Experiment 1: Monotonicity
Monotonicity Universal (Barwise and Cooper 1981)
All simple determiners are monotone.
Quantifiers: ≥ 4, ≤ 4, = 4
23 / 38
Semantic Universals The Learning Model Experiments Conclusion
Monotonicity: Results
24 / 38
Semantic Universals The Learning Model Experiments Conclusion
Monotonicity: Results
25 / 38
Semantic Universals The Learning Model Experiments Conclusion
Monotonicity: Discussion
≥ 4 easier than = 4: X
≥ 4 easier than ≤ 4, and ≤ 4 not easier than = 4: ???
Upward monotone are generally cognitively easier than downward[Just and Carpenter 1971; Geurts 2003; Geurts and Slik 2005;Deschamps et al. 2015]= 4 is a conjunction of monotone quantifiers
A better measure of learning rate than convergence point?
26 / 38
Semantic Universals The Learning Model Experiments Conclusion
Monotonicity: Discussion
≥ 4 easier than = 4: X
≥ 4 easier than ≤ 4, and ≤ 4 not easier than = 4: ???
Upward monotone are generally cognitively easier than downward[Just and Carpenter 1971; Geurts 2003; Geurts and Slik 2005;Deschamps et al. 2015]= 4 is a conjunction of monotone quantifiers
A better measure of learning rate than convergence point?
26 / 38
Semantic Universals The Learning Model Experiments Conclusion
Monotonicity: Discussion
≥ 4 easier than = 4: X
≥ 4 easier than ≤ 4, and ≤ 4 not easier than = 4: ???
Upward monotone are generally cognitively easier than downward[Just and Carpenter 1971; Geurts 2003; Geurts and Slik 2005;Deschamps et al. 2015]
= 4 is a conjunction of monotone quantifiers
A better measure of learning rate than convergence point?
26 / 38
Semantic Universals The Learning Model Experiments Conclusion
Monotonicity: Discussion
≥ 4 easier than = 4: X
≥ 4 easier than ≤ 4, and ≤ 4 not easier than = 4: ???
Upward monotone are generally cognitively easier than downward[Just and Carpenter 1971; Geurts 2003; Geurts and Slik 2005;Deschamps et al. 2015]= 4 is a conjunction of monotone quantifiers
A better measure of learning rate than convergence point?
26 / 38
Semantic Universals The Learning Model Experiments Conclusion
Monotonicity: Discussion
≥ 4 easier than = 4: X
≥ 4 easier than ≤ 4, and ≤ 4 not easier than = 4: ???
Upward monotone are generally cognitively easier than downward[Just and Carpenter 1971; Geurts 2003; Geurts and Slik 2005;Deschamps et al. 2015]= 4 is a conjunction of monotone quantifiers
A better measure of learning rate than convergence point?
26 / 38
Semantic Universals The Learning Model Experiments Conclusion
Experiment 2: Quantity
Quantity Universal
All simple determiners are quantitative.
Quantifiers: ≥ 3, first3
27 / 38
Semantic Universals The Learning Model Experiments Conclusion
Experiment 2: Quantity
Quantity Universal
All simple determiners are quantitative.
Quantifiers: ≥ 3, first3
27 / 38
Semantic Universals The Learning Model Experiments Conclusion
Quantity: Results
28 / 38
Semantic Universals The Learning Model Experiments Conclusion
Quantity: Results
29 / 38
Semantic Universals The Learning Model Experiments Conclusion
Quantity: Discussion
Very promising initial results
Harder to learn an order-sensitive quantifier than one that is onlysensitive to quantity
30 / 38
Semantic Universals The Learning Model Experiments Conclusion
Quantity: Discussion
Very promising initial results
Harder to learn an order-sensitive quantifier than one that is onlysensitive to quantity
30 / 38
Semantic Universals The Learning Model Experiments Conclusion
Experiment 3: Conservativity
Conservativity Universal (Barwise and Cooper 1981)
All simple determiners are conservative.
Quantifiers: nall, nonly[Motivated by Hunter and Lidz 2013]
31 / 38
Semantic Universals The Learning Model Experiments Conclusion
Experiment 3: Conservativity
Conservativity Universal (Barwise and Cooper 1981)
All simple determiners are conservative.
Quantifiers: nall, nonly[Motivated by Hunter and Lidz 2013]
31 / 38
Semantic Universals The Learning Model Experiments Conclusion
Conservativity: Results
32 / 38
Semantic Universals The Learning Model Experiments Conclusion
Conservativity: Results
33 / 38
Semantic Universals The Learning Model Experiments Conclusion
Conservativity: Discussion
No way of ‘breaking the symmetry’ between A ∩ B and B \ A in thismodel
More boldly: conservativity as a syntactic/structural constraint, nota semantic universal[See Fox 2002; Sportiche 2005; Romoli 2015]
34 / 38
Semantic Universals The Learning Model Experiments Conclusion
Conservativity: Discussion
No way of ‘breaking the symmetry’ between A ∩ B and B \ A in thismodel
More boldly: conservativity as a syntactic/structural constraint, nota semantic universal[See Fox 2002; Sportiche 2005; Romoli 2015]
34 / 38
Semantic Universals The Learning Model Experiments Conclusion
Overview
1 Semantic Universals
2 The Learning Model
3 Experiments
4 Conclusion
35 / 38
Semantic Universals The Learning Model Experiments Conclusion
Towards Meeting the Challenge
Challenge: find a model of learnability on which quantifierssatisfying a universal are easier to learn than those that do not.
We showed how to train LSTM networks via backpropagation toverify quantifiers.
Initial experiments show that this setup may be able to meet thechallenge.
36 / 38
Semantic Universals The Learning Model Experiments Conclusion
Towards Meeting the Challenge
Challenge: find a model of learnability on which quantifierssatisfying a universal are easier to learn than those that do not.
We showed how to train LSTM networks via backpropagation toverify quantifiers.
Initial experiments show that this setup may be able to meet thechallenge.
36 / 38
Semantic Universals The Learning Model Experiments Conclusion
Towards Meeting the Challenge
Challenge: find a model of learnability on which quantifierssatisfying a universal are easier to learn than those that do not.
We showed how to train LSTM networks via backpropagation toverify quantifiers.
Initial experiments show that this setup may be able to meet thechallenge.
36 / 38
Semantic Universals The Learning Model Experiments Conclusion
Future Work
More and larger experiments
Different models, including baselines for this task
More realistic visual input (see Sorodoc, Lazaridou, et al. 2016;Sorodoc, Pezzelle, et al. 2017)
Tools to ‘look inside’ the black box and see what the network isdoing
37 / 38
Semantic Universals The Learning Model Experiments Conclusion
Future Work
More and larger experiments
Different models, including baselines for this task
More realistic visual input (see Sorodoc, Lazaridou, et al. 2016;Sorodoc, Pezzelle, et al. 2017)
Tools to ‘look inside’ the black box and see what the network isdoing
37 / 38
Semantic Universals The Learning Model Experiments Conclusion
Future Work
More and larger experiments
Different models, including baselines for this task
More realistic visual input (see Sorodoc, Lazaridou, et al. 2016;Sorodoc, Pezzelle, et al. 2017)
Tools to ‘look inside’ the black box and see what the network isdoing
37 / 38
Semantic Universals The Learning Model Experiments Conclusion
Future Work
More and larger experiments
Different models, including baselines for this task
More realistic visual input (see Sorodoc, Lazaridou, et al. 2016;Sorodoc, Pezzelle, et al. 2017)
Tools to ‘look inside’ the black box and see what the network isdoing
37 / 38
Semantic Universals The Learning Model Experiments Conclusion
The End
Thank you!
38 / 38