learnability and semantic universals

80
Semantic Universals The Learning Model Experiments Conclusion Learnability and Semantic Universals Shane Steinert-Threlkeld (joint work-in-progress with Jakub Szymanik) Cognitive Semantics and Quantities Kick-off Workshop September 28, 2017 1 / 38

Upload: others

Post on 11-Sep-2021

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Learnability and Semantic Universals

Shane Steinert-Threlkeld(joint work-in-progress with Jakub Szymanik)

Cognitive Semantics and Quantities Kick-off WorkshopSeptember 28, 2017

1 / 38

Page 2: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Overview

1 Semantic Universals

2 The Learning Model

3 Experiments

4 Conclusion

2 / 38

Page 3: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Universals in Linguistic Theory

Question

What is the range of variation in human languages? That is: which outof all of the logically possible languages that humans could speak, dothey in fact speak?

3 / 38

Page 4: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Example Universals

Sound (phonology):All languages have consonants and vowels. Every language has atleast /a i u/.

Grammar (syntax):All languages have verbs and nouns.

Meaning (semantics):All languages have syntactic constituents (NPs) whose semanticfunction is to express generalized quantifiers. (Barwise and Cooper1981)

4 / 38

Page 5: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Example Universals

Sound (phonology):All languages have consonants and vowels. Every language has atleast /a i u/.

Grammar (syntax):All languages have verbs and nouns.

Meaning (semantics):All languages have syntactic constituents (NPs) whose semanticfunction is to express generalized quantifiers. (Barwise and Cooper1981)

4 / 38

Page 6: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Example Universals

Sound (phonology):All languages have consonants and vowels. Every language has atleast /a i u/.

Grammar (syntax):All languages have verbs and nouns.

Meaning (semantics):

All languages have syntactic constituents (NPs) whose semanticfunction is to express generalized quantifiers. (Barwise and Cooper1981)

4 / 38

Page 7: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Example Universals

Sound (phonology):All languages have consonants and vowels. Every language has atleast /a i u/.

Grammar (syntax):All languages have verbs and nouns.

Meaning (semantics):All languages have syntactic constituents (NPs) whose semanticfunction is to express generalized quantifiers. (Barwise and Cooper1981)

4 / 38

Page 8: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Determiners

Determiners:

Simple: every, some, few, most, five, . . .Complex: all but five, fewer than three, at least eight or fewer thanfive, . . .

Denote type 〈1, 1〉 generalized quantifiers: sets of models of theform 〈M,A,B〉 with A,B ⊆ M

For example:

JeveryK = {〈M,A,B〉 : A ⊆ B}JmostK = {〈M,A,B〉 : |A ∩ B| > |A \ B|}

5 / 38

Page 9: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Determiners

Determiners:

Simple: every, some, few, most, five, . . .Complex: all but five, fewer than three, at least eight or fewer thanfive, . . .

Denote type 〈1, 1〉 generalized quantifiers: sets of models of theform 〈M,A,B〉 with A,B ⊆ M

For example:

JeveryK = {〈M,A,B〉 : A ⊆ B}JmostK = {〈M,A,B〉 : |A ∩ B| > |A \ B|}

5 / 38

Page 10: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Determiners

Determiners:

Simple: every, some, few, most, five, . . .Complex: all but five, fewer than three, at least eight or fewer thanfive, . . .

Denote type 〈1, 1〉 generalized quantifiers: sets of models of theform 〈M,A,B〉 with A,B ⊆ M

For example:

JeveryK = {〈M,A,B〉 : A ⊆ B}JmostK = {〈M,A,B〉 : |A ∩ B| > |A \ B|}

5 / 38

Page 11: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Monotonicity

Many French people smoke cigarettes⇒ Many French people smoke

Few French people smoke⇒ Few French people smoke cigarettes

Q is upward (resp. downward) monotone iff:if 〈M,A,B〉 ∈ Q and B ⊆ B ′ (resp. B ′ ⊆ B), then 〈M,A,B〉 ∈ Q

A determiner is monotone if it denotes either an upward ordownward monotone generalized quantifier

6 / 38

Page 12: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Monotonicity

Many French people smoke cigarettes⇒ Many French people smoke

Few French people smoke⇒ Few French people smoke cigarettes

Q is upward (resp. downward) monotone iff:if 〈M,A,B〉 ∈ Q and B ⊆ B ′ (resp. B ′ ⊆ B), then 〈M,A,B〉 ∈ Q

A determiner is monotone if it denotes either an upward ordownward monotone generalized quantifier

6 / 38

Page 13: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Monotonicity

Many French people smoke cigarettes⇒ Many French people smoke

Few French people smoke⇒ Few French people smoke cigarettes

Q is upward (resp. downward) monotone iff:if 〈M,A,B〉 ∈ Q and B ⊆ B ′ (resp. B ′ ⊆ B), then 〈M,A,B〉 ∈ Q

A determiner is monotone if it denotes either an upward ordownward monotone generalized quantifier

6 / 38

Page 14: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Monotonicity Universal

Monotonicity Universal (Barwise and Cooper 1981)

All simple determiners are monotone.

7 / 38

Page 15: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Quantity

Key intuition: determiners denote general relations between theirrestrictor and nuclear scope, not ones that depend on the identitiesof particular elements of the domain

Q is permutation-closed iff:if 〈M,A,B〉 ∈ Q and π is a permutation of M, thenπ(〈M,A,B〉) ∈ Q

Q is isomorphism-closed iff:if 〈M,A,B〉 ∈ Q and 〈M,A,B〉 ∼= 〈M ′,A′,B ′〉, then〈M ′,A′,B ′〉 ∈ Q

Q is quantitative:if 〈M,A,B〉 ∈ Q and A∩B,A \B,B \A,M \ (A∪B) have the samecardinality as their primed-counterparts, then 〈M ′,A′,B ′〉 ∈ Q

8 / 38

Page 16: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Quantity

Key intuition: determiners denote general relations between theirrestrictor and nuclear scope, not ones that depend on the identitiesof particular elements of the domain

Q is permutation-closed iff:if 〈M,A,B〉 ∈ Q and π is a permutation of M, thenπ(〈M,A,B〉) ∈ Q

Q is isomorphism-closed iff:if 〈M,A,B〉 ∈ Q and 〈M,A,B〉 ∼= 〈M ′,A′,B ′〉, then〈M ′,A′,B ′〉 ∈ Q

Q is quantitative:if 〈M,A,B〉 ∈ Q and A∩B,A \B,B \A,M \ (A∪B) have the samecardinality as their primed-counterparts, then 〈M ′,A′,B ′〉 ∈ Q

8 / 38

Page 17: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Quantity

Key intuition: determiners denote general relations between theirrestrictor and nuclear scope, not ones that depend on the identitiesof particular elements of the domain

Q is permutation-closed iff:if 〈M,A,B〉 ∈ Q and π is a permutation of M, thenπ(〈M,A,B〉) ∈ Q

Q is isomorphism-closed iff:if 〈M,A,B〉 ∈ Q and 〈M,A,B〉 ∼= 〈M ′,A′,B ′〉, then〈M ′,A′,B ′〉 ∈ Q

Q is quantitative:if 〈M,A,B〉 ∈ Q and A∩B,A \B,B \A,M \ (A∪B) have the samecardinality as their primed-counterparts, then 〈M ′,A′,B ′〉 ∈ Q

8 / 38

Page 18: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Quantity

Key intuition: determiners denote general relations between theirrestrictor and nuclear scope, not ones that depend on the identitiesof particular elements of the domain

Q is permutation-closed iff:if 〈M,A,B〉 ∈ Q and π is a permutation of M, thenπ(〈M,A,B〉) ∈ Q

Q is isomorphism-closed iff:if 〈M,A,B〉 ∈ Q and 〈M,A,B〉 ∼= 〈M ′,A′,B ′〉, then〈M ′,A′,B ′〉 ∈ Q

Q is quantitative:if 〈M,A,B〉 ∈ Q and A∩B,A \B,B \A,M \ (A∪B) have the samecardinality as their primed-counterparts, then 〈M ′,A′,B ′〉 ∈ Q

8 / 38

Page 19: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Quantity

Keenan and Stavi 1986, p. 311:“Monomorphemic dets are logical [= permutation-closed].”

Peters and Westerstahl 2006, p. 330:“All lexical quantifier expressions in natural languages denote ISOM[= isomorphism-closed] quantifiers.”

9 / 38

Page 20: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Quantity

Keenan and Stavi 1986, p. 311:“Monomorphemic dets are logical [= permutation-closed].”

Peters and Westerstahl 2006, p. 330:“All lexical quantifier expressions in natural languages denote ISOM[= isomorphism-closed] quantifiers.”

9 / 38

Page 21: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Quantity Universal

Quantity Universal

All simple determiners are quantitative.

10 / 38

Page 22: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Conservativity

Key intuition: the restrictor restricts what the determiner talks about

Many French people smoke cigarettes≡ Many French people are French people who smoke cigarettes

Q is conservative iff:〈M,A,B〉 ∈ Q if and only if 〈M,A,A ∩ B〉 ∈ Q

11 / 38

Page 23: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Conservativity

Key intuition: the restrictor restricts what the determiner talks about

Many French people smoke cigarettes≡ Many French people are French people who smoke cigarettes

Q is conservative iff:〈M,A,B〉 ∈ Q if and only if 〈M,A,A ∩ B〉 ∈ Q

11 / 38

Page 24: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Conservativity

Key intuition: the restrictor restricts what the determiner talks about

Many French people smoke cigarettes≡ Many French people are French people who smoke cigarettes

Q is conservative iff:〈M,A,B〉 ∈ Q if and only if 〈M,A,A ∩ B〉 ∈ Q

11 / 38

Page 25: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Conservativity Universal

Conservativity Universal (Barwise and Cooper 1981)

All simple determiners are conservative.

12 / 38

Page 26: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Explaining Universals

Natural Question

Why do the universals hold? What explains the limited range ofquantifiers expressed by simple determiners?

13 / 38

Page 27: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Explaining Universals

Natural Question

Why do the universals hold? What explains the limited range ofquantifiers expressed by simple determiners?

Answer 1: learnability.(Barwise and Cooper 1981; Keenan and Stavi 1986; Szabolcsi 2010)

The universals greatly restrict the search space that a languagelearner must explore when learning the meanings of determiners.This makes it easier (possible?) for them to learn such meaningsfrom relatively small input.[Compare: Poverty of the Stimulus argument for UG.]

In a sense must be true, but:“Likely, the unrestricted space has many hypotheses which are soimplausible, they can be ignored quickly and do not affect learning.The hard part of learning, may be choosing between the plausiblecompetitor meanings, not in weeding out a large space of potentialmeanings.” Piantadosi 2013, p. 22

13 / 38

Page 28: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Explaining Universals

Natural Question

Why do the universals hold? What explains the limited range ofquantifiers expressed by simple determiners?

Answer 1: learnability.(Barwise and Cooper 1981; Keenan and Stavi 1986; Szabolcsi 2010)

The universals greatly restrict the search space that a languagelearner must explore when learning the meanings of determiners.This makes it easier (possible?) for them to learn such meaningsfrom relatively small input.[Compare: Poverty of the Stimulus argument for UG.]

In a sense must be true, but:“Likely, the unrestricted space has many hypotheses which are soimplausible, they can be ignored quickly and do not affect learning.The hard part of learning, may be choosing between the plausiblecompetitor meanings, not in weeding out a large space of potentialmeanings.” Piantadosi 2013, p. 22

13 / 38

Page 29: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Explaining Universals

Natural Question

Why do the universals hold? What explains the limited range ofquantifiers expressed by simple determiners?

Answer 1: learnability.(Barwise and Cooper 1981; Keenan and Stavi 1986; Szabolcsi 2010)

The universals greatly restrict the search space that a languagelearner must explore when learning the meanings of determiners.This makes it easier (possible?) for them to learn such meaningsfrom relatively small input.[Compare: Poverty of the Stimulus argument for UG.]

In a sense must be true, but:“Likely, the unrestricted space has many hypotheses which are soimplausible, they can be ignored quickly and do not affect learning.The hard part of learning, may be choosing between the plausiblecompetitor meanings, not in weeding out a large space of potentialmeanings.” Piantadosi 2013, p. 22

13 / 38

Page 30: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Explaining Universals

Natural Question

Why do the universals hold? What explains the limited range ofquantifiers expressed by simple determiners?

Answer 2: learnability.(Peters and Westerstahl 2006)

The universals aid learnability because quantifiers satisfying theuniversals are easier to learn than those that do not.

Challenge: provide a model of learning which makes good on thispromise.[See Tiede 1999; Gierasimczuk 2007; Gierasimczuk 2009 for earlierattempts.]

13 / 38

Page 31: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Explaining Universals

Natural Question

Why do the universals hold? What explains the limited range ofquantifiers expressed by simple determiners?

Answer 2: learnability.(Peters and Westerstahl 2006)

The universals aid learnability because quantifiers satisfying theuniversals are easier to learn than those that do not.

Challenge: provide a model of learning which makes good on thispromise.[See Tiede 1999; Gierasimczuk 2007; Gierasimczuk 2009 for earlierattempts.]

13 / 38

Page 32: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Explaining Universals

Natural Question

Why do the universals hold? What explains the limited range ofquantifiers expressed by simple determiners?

Answer 2: learnability.(Peters and Westerstahl 2006)

The universals aid learnability because quantifiers satisfying theuniversals are easier to learn than those that do not.

Challenge: provide a model of learning which makes good on thispromise.[See Tiede 1999; Gierasimczuk 2007; Gierasimczuk 2009 for earlierattempts.]

13 / 38

Page 33: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Overview

1 Semantic Universals

2 The Learning Model

3 Experiments

4 Conclusion

14 / 38

Page 34: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Neural Networks

Source: Nielsen, “Neural Networks and Deep Learning”, Determination Press

15 / 38

Page 35: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Gradient Descent and Back-propogation

The learning framework will be a non-convex optimization problem.

A total loss function, which will be the mean of a ‘local’ errorfunction:

L =1

N

∑`(yi , yi )

=1

N

∑`(NN(~θ, xi ), yi )

Navigate the landscape of weight space towards lower loss:

~θt+1 ← ~θt − α∇~θL

Back-propagation: calculate `(·, ·) after a forward-pass of thenetwork. ‘Propagate’ the error backwards through the network tocalculate the gradient.

16 / 38

Page 36: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Gradient Descent and Back-propogation

The learning framework will be a non-convex optimization problem.

A total loss function, which will be the mean of a ‘local’ errorfunction:

L =1

N

∑`(yi , yi )

=1

N

∑`(NN(~θ, xi ), yi )

Navigate the landscape of weight space towards lower loss:

~θt+1 ← ~θt − α∇~θL

Back-propagation: calculate `(·, ·) after a forward-pass of thenetwork. ‘Propagate’ the error backwards through the network tocalculate the gradient.

16 / 38

Page 37: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Gradient Descent and Back-propogation

The learning framework will be a non-convex optimization problem.

A total loss function, which will be the mean of a ‘local’ errorfunction:

L =1

N

∑`(yi , yi )

=1

N

∑`(NN(~θ, xi ), yi )

Navigate the landscape of weight space towards lower loss:

~θt+1 ← ~θt − α∇~θL

Back-propagation: calculate `(·, ·) after a forward-pass of thenetwork. ‘Propagate’ the error backwards through the network tocalculate the gradient.

16 / 38

Page 38: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Recurrent Neural Networks

Source: Olah, “Understanding LSTM Networks”, http://colah.github.io/posts/2015-08-Understanding-LSTMs/

17 / 38

Page 39: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Long Short-Term Memory

Introduced in Hochreiter and Schmidhuber 1997 to solve theexploding/vanishing gradients problem.

Source: Olah, “Understanding LSTM Networks”, http://colah.github.io/posts/2015-08-Understanding-LSTMs/

18 / 38

Page 40: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

The Learning Task

Our data will be the following:

Input: 〈Q,M〉 pairs, presented sequentiallyOutput: T/F, depending on whether M ∈ Q

So, this will be a sequence classification task.

The loss function we minimize is cross-entropy. In our case, withy ∈ {0, 1}, very simple:

`(NN(~θ, x), y) = − ln(NN(~θ, x)y )

19 / 38

Page 41: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

The Learning Task

Our data will be the following:

Input: 〈Q,M〉 pairs, presented sequentiallyOutput: T/F, depending on whether M ∈ Q

So, this will be a sequence classification task.

The loss function we minimize is cross-entropy. In our case, withy ∈ {0, 1}, very simple:

`(NN(~θ, x), y) = − ln(NN(~θ, x)y )

19 / 38

Page 42: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

The Learning Task

Our data will be the following:

Input: 〈Q,M〉 pairs, presented sequentiallyOutput: T/F, depending on whether M ∈ Q

So, this will be a sequence classification task.

The loss function we minimize is cross-entropy. In our case, withy ∈ {0, 1}, very simple:

`(NN(~θ, x), y) = − ln(NN(~θ, x)y )

19 / 38

Page 43: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Data Generation

Algorithm 1 Data Generation Algorithm

Inputs: max len, num data, quants

data← []while len(data) < num data do

N ∼ Unif ([max len])Q ∼ Unif (quants)cur seq← Unif ({A ∩ B,A \ B,B \ A,M \ (A ∪ B)} ,N)if 〈Q, cur seq〉 /∈ data then

data.append(generate point(〈Q, cur seq,Q ∈ cur seq?〉))end if

end whilereturn shuffle(data)

20 / 38

Page 44: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Overview

1 Semantic Universals

2 The Learning Model

3 Experiments

4 Conclusion

21 / 38

Page 45: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

General Set-up

For each proposed semantic universal:

Choose a set of minimally different quantifiers that disagree withrespect to the universal

Run some number of trials of training an LSTM to learn thosequantifiers

Stop when:500 minibatch running mean total accuracy on test set is > 0.99; ormean probability assigned to correct truth-value on test set is > 0.99

Measure, for each quantifier, its convergence point:first i such that mean accuracy from i onward for Q is > 0.98

Code and Data Available At:

http://github.com/shanest/quantifier-rnn-learning

22 / 38

Page 46: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

General Set-up

For each proposed semantic universal:

Choose a set of minimally different quantifiers that disagree withrespect to the universal

Run some number of trials of training an LSTM to learn thosequantifiers

Stop when:500 minibatch running mean total accuracy on test set is > 0.99; ormean probability assigned to correct truth-value on test set is > 0.99

Measure, for each quantifier, its convergence point:first i such that mean accuracy from i onward for Q is > 0.98

Code and Data Available At:

http://github.com/shanest/quantifier-rnn-learning

22 / 38

Page 47: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

General Set-up

For each proposed semantic universal:

Choose a set of minimally different quantifiers that disagree withrespect to the universal

Run some number of trials of training an LSTM to learn thosequantifiers

Stop when:500 minibatch running mean total accuracy on test set is > 0.99; ormean probability assigned to correct truth-value on test set is > 0.99

Measure, for each quantifier, its convergence point:first i such that mean accuracy from i onward for Q is > 0.98

Code and Data Available At:

http://github.com/shanest/quantifier-rnn-learning

22 / 38

Page 48: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

General Set-up

For each proposed semantic universal:

Choose a set of minimally different quantifiers that disagree withrespect to the universal

Run some number of trials of training an LSTM to learn thosequantifiers

Stop when:500 minibatch running mean total accuracy on test set is > 0.99; ormean probability assigned to correct truth-value on test set is > 0.99

Measure, for each quantifier, its convergence point:first i such that mean accuracy from i onward for Q is > 0.98

Code and Data Available At:

http://github.com/shanest/quantifier-rnn-learning

22 / 38

Page 49: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

General Set-up

For each proposed semantic universal:

Choose a set of minimally different quantifiers that disagree withrespect to the universal

Run some number of trials of training an LSTM to learn thosequantifiers

Stop when:500 minibatch running mean total accuracy on test set is > 0.99; ormean probability assigned to correct truth-value on test set is > 0.99

Measure, for each quantifier, its convergence point:first i such that mean accuracy from i onward for Q is > 0.98

Code and Data Available At:

http://github.com/shanest/quantifier-rnn-learning

22 / 38

Page 50: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

General Set-up

For each proposed semantic universal:

Choose a set of minimally different quantifiers that disagree withrespect to the universal

Run some number of trials of training an LSTM to learn thosequantifiers

Stop when:500 minibatch running mean total accuracy on test set is > 0.99; ormean probability assigned to correct truth-value on test set is > 0.99

Measure, for each quantifier, its convergence point:first i such that mean accuracy from i onward for Q is > 0.98

Code and Data Available At:

http://github.com/shanest/quantifier-rnn-learning

22 / 38

Page 51: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Experiment 1: Monotonicity

Monotonicity Universal (Barwise and Cooper 1981)

All simple determiners are monotone.

Quantifiers: ≥ 4, ≤ 4, = 4

23 / 38

Page 52: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Experiment 1: Monotonicity

Monotonicity Universal (Barwise and Cooper 1981)

All simple determiners are monotone.

Quantifiers: ≥ 4, ≤ 4, = 4

23 / 38

Page 53: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Monotonicity: Results

24 / 38

Page 54: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Monotonicity: Results

25 / 38

Page 55: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Monotonicity: Discussion

≥ 4 easier than = 4: X

≥ 4 easier than ≤ 4, and ≤ 4 not easier than = 4: ???

Upward monotone are generally cognitively easier than downward[Just and Carpenter 1971; Geurts 2003; Geurts and Slik 2005;Deschamps et al. 2015]= 4 is a conjunction of monotone quantifiers

A better measure of learning rate than convergence point?

26 / 38

Page 56: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Monotonicity: Discussion

≥ 4 easier than = 4: X

≥ 4 easier than ≤ 4, and ≤ 4 not easier than = 4: ???

Upward monotone are generally cognitively easier than downward[Just and Carpenter 1971; Geurts 2003; Geurts and Slik 2005;Deschamps et al. 2015]= 4 is a conjunction of monotone quantifiers

A better measure of learning rate than convergence point?

26 / 38

Page 57: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Monotonicity: Discussion

≥ 4 easier than = 4: X

≥ 4 easier than ≤ 4, and ≤ 4 not easier than = 4: ???

Upward monotone are generally cognitively easier than downward[Just and Carpenter 1971; Geurts 2003; Geurts and Slik 2005;Deschamps et al. 2015]

= 4 is a conjunction of monotone quantifiers

A better measure of learning rate than convergence point?

26 / 38

Page 58: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Monotonicity: Discussion

≥ 4 easier than = 4: X

≥ 4 easier than ≤ 4, and ≤ 4 not easier than = 4: ???

Upward monotone are generally cognitively easier than downward[Just and Carpenter 1971; Geurts 2003; Geurts and Slik 2005;Deschamps et al. 2015]= 4 is a conjunction of monotone quantifiers

A better measure of learning rate than convergence point?

26 / 38

Page 59: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Monotonicity: Discussion

≥ 4 easier than = 4: X

≥ 4 easier than ≤ 4, and ≤ 4 not easier than = 4: ???

Upward monotone are generally cognitively easier than downward[Just and Carpenter 1971; Geurts 2003; Geurts and Slik 2005;Deschamps et al. 2015]= 4 is a conjunction of monotone quantifiers

A better measure of learning rate than convergence point?

26 / 38

Page 60: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Experiment 2: Quantity

Quantity Universal

All simple determiners are quantitative.

Quantifiers: ≥ 3, first3

27 / 38

Page 61: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Experiment 2: Quantity

Quantity Universal

All simple determiners are quantitative.

Quantifiers: ≥ 3, first3

27 / 38

Page 62: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Quantity: Results

28 / 38

Page 63: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Quantity: Results

29 / 38

Page 64: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Quantity: Discussion

Very promising initial results

Harder to learn an order-sensitive quantifier than one that is onlysensitive to quantity

30 / 38

Page 65: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Quantity: Discussion

Very promising initial results

Harder to learn an order-sensitive quantifier than one that is onlysensitive to quantity

30 / 38

Page 66: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Experiment 3: Conservativity

Conservativity Universal (Barwise and Cooper 1981)

All simple determiners are conservative.

Quantifiers: nall, nonly[Motivated by Hunter and Lidz 2013]

31 / 38

Page 67: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Experiment 3: Conservativity

Conservativity Universal (Barwise and Cooper 1981)

All simple determiners are conservative.

Quantifiers: nall, nonly[Motivated by Hunter and Lidz 2013]

31 / 38

Page 68: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Conservativity: Results

32 / 38

Page 69: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Conservativity: Results

33 / 38

Page 70: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Conservativity: Discussion

No way of ‘breaking the symmetry’ between A ∩ B and B \ A in thismodel

More boldly: conservativity as a syntactic/structural constraint, nota semantic universal[See Fox 2002; Sportiche 2005; Romoli 2015]

34 / 38

Page 71: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Conservativity: Discussion

No way of ‘breaking the symmetry’ between A ∩ B and B \ A in thismodel

More boldly: conservativity as a syntactic/structural constraint, nota semantic universal[See Fox 2002; Sportiche 2005; Romoli 2015]

34 / 38

Page 72: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Overview

1 Semantic Universals

2 The Learning Model

3 Experiments

4 Conclusion

35 / 38

Page 73: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Towards Meeting the Challenge

Challenge: find a model of learnability on which quantifierssatisfying a universal are easier to learn than those that do not.

We showed how to train LSTM networks via backpropagation toverify quantifiers.

Initial experiments show that this setup may be able to meet thechallenge.

36 / 38

Page 74: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Towards Meeting the Challenge

Challenge: find a model of learnability on which quantifierssatisfying a universal are easier to learn than those that do not.

We showed how to train LSTM networks via backpropagation toverify quantifiers.

Initial experiments show that this setup may be able to meet thechallenge.

36 / 38

Page 75: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Towards Meeting the Challenge

Challenge: find a model of learnability on which quantifierssatisfying a universal are easier to learn than those that do not.

We showed how to train LSTM networks via backpropagation toverify quantifiers.

Initial experiments show that this setup may be able to meet thechallenge.

36 / 38

Page 76: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Future Work

More and larger experiments

Different models, including baselines for this task

More realistic visual input (see Sorodoc, Lazaridou, et al. 2016;Sorodoc, Pezzelle, et al. 2017)

Tools to ‘look inside’ the black box and see what the network isdoing

37 / 38

Page 77: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Future Work

More and larger experiments

Different models, including baselines for this task

More realistic visual input (see Sorodoc, Lazaridou, et al. 2016;Sorodoc, Pezzelle, et al. 2017)

Tools to ‘look inside’ the black box and see what the network isdoing

37 / 38

Page 78: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Future Work

More and larger experiments

Different models, including baselines for this task

More realistic visual input (see Sorodoc, Lazaridou, et al. 2016;Sorodoc, Pezzelle, et al. 2017)

Tools to ‘look inside’ the black box and see what the network isdoing

37 / 38

Page 79: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

Future Work

More and larger experiments

Different models, including baselines for this task

More realistic visual input (see Sorodoc, Lazaridou, et al. 2016;Sorodoc, Pezzelle, et al. 2017)

Tools to ‘look inside’ the black box and see what the network isdoing

37 / 38

Page 80: Learnability and Semantic Universals

Semantic Universals The Learning Model Experiments Conclusion

The End

Thank you!

38 / 38