learnability and semantic universals

Semantic Universals The Learning Model Experiments Conclusion

Learnability and Semantic Universals

Shane Steinert-Threlkeld(joint work-in-progress with Jakub Szymanik)

Cognitive Semantics and Quantities Kick-off WorkshopSeptember 28, 2017

1 / 38

Overview

1 Semantic Universals

2 The Learning Model

3 Experiments

4 Conclusion

2 / 38

Universals in Linguistic Theory

Question

What is the range of variation in human languages? That is: which outof all of the logically possible languages that humans could speak, dothey in fact speak?

3 / 38

Example Universals

Sound (phonology):All languages have consonants and vowels. Every language has atleast /a i u/.

Grammar (syntax):All languages have verbs and nouns.

Meaning (semantics):All languages have syntactic constituents (NPs) whose semanticfunction is to express generalized quantifiers. (Barwise and Cooper1981)

4 / 38

Example Universals

4 / 38

Example Universals

Meaning (semantics):

All languages have syntactic constituents (NPs) whose semanticfunction is to express generalized quantifiers. (Barwise and Cooper1981)

4 / 38

Example Universals

4 / 38

Determiners

Determiners:

Simple: every, some, few, most, five, . . .Complex: all but five, fewer than three, at least eight or fewer thanfive, . . .

Denote type 〈1, 1〉 generalized quantifiers: sets of models of theform 〈M,A,B〉 with A,B ⊆ M

For example:

JeveryK = {〈M,A,B〉 : A ⊆ B}JmostK = {〈M,A,B〉 : |A ∩ B| > |A \ B|}

5 / 38

Determiners

Determiners:

For example:

5 / 38

Determiners

Determiners:

For example:

5 / 38

Monotonicity

Many French people smoke cigarettes⇒ Many French people smoke

Few French people smoke⇒ Few French people smoke cigarettes

Q is upward (resp. downward) monotone iff:if 〈M,A,B〉 ∈ Q and B ⊆ B ′ (resp. B ′ ⊆ B), then 〈M,A,B〉 ∈ Q

A determiner is monotone if it denotes either an upward ordownward monotone generalized quantifier

6 / 38

Monotonicity

6 / 38

Monotonicity

6 / 38

Monotonicity Universal

Monotonicity Universal (Barwise and Cooper 1981)

All simple determiners are monotone.

7 / 38

Quantity

Key intuition: determiners denote general relations between theirrestrictor and nuclear scope, not ones that depend on the identitiesof particular elements of the domain

Q is permutation-closed iff:if 〈M,A,B〉 ∈ Q and π is a permutation of M, thenπ(〈M,A,B〉) ∈ Q

Q is isomorphism-closed iff:if 〈M,A,B〉 ∈ Q and 〈M,A,B〉 ∼= 〈M ′,A′,B ′〉, then〈M ′,A′,B ′〉 ∈ Q

Q is quantitative:if 〈M,A,B〉 ∈ Q and A∩B,A \B,B \A,M \ (A∪B) have the samecardinality as their primed-counterparts, then 〈M ′,A′,B ′〉 ∈ Q

8 / 38

Quantity

8 / 38

Quantity

8 / 38

Quantity

8 / 38

Quantity

Keenan and Stavi 1986, p. 311:“Monomorphemic dets are logical [= permutation-closed].”

Peters and Westerstahl 2006, p. 330:“All lexical quantifier expressions in natural languages denote ISOM[= isomorphism-closed] quantifiers.”

9 / 38

Quantity

Keenan and Stavi 1986, p. 311:“Monomorphemic dets are logical [= permutation-closed].”

Peters and Westerstahl 2006, p. 330:“All lexical quantifier expressions in natural languages denote ISOM[= isomorphism-closed] quantifiers.”

9 / 38

Quantity Universal

All simple determiners are quantitative.

10 / 38

Conservativity

Key intuition: the restrictor restricts what the determiner talks about

Many French people smoke cigarettes≡ Many French people are French people who smoke cigarettes

Q is conservative iff:〈M,A,B〉 ∈ Q if and only if 〈M,A,A ∩ B〉 ∈ Q

11 / 38

Conservativity

11 / 38

Conservativity

11 / 38

Conservativity Universal

Conservativity Universal (Barwise and Cooper 1981)

All simple determiners are conservative.

12 / 38

Explaining Universals

Natural Question

Why do the universals hold? What explains the limited range ofquantifiers expressed by simple determiners?

13 / 38

Natural Question

Answer 1: learnability.(Barwise and Cooper 1981; Keenan and Stavi 1986; Szabolcsi 2010)

The universals greatly restrict the search space that a languagelearner must explore when learning the meanings of determiners.This makes it easier (possible?) for them to learn such meaningsfrom relatively small input.[Compare: Poverty of the Stimulus argument for UG.]

In a sense must be true, but:“Likely, the unrestricted space has many hypotheses which are soimplausible, they can be ignored quickly and do not affect learning.The hard part of learning, may be choosing between the plausiblecompetitor meanings, not in weeding out a large space of potentialmeanings.” Piantadosi 2013, p. 22

13 / 38

Natural Question

13 / 38

Natural Question

13 / 38

Natural Question

Answer 2: learnability.(Peters and Westerstahl 2006)

The universals aid learnability because quantifiers satisfying theuniversals are easier to learn than those that do not.

Challenge: provide a model of learning which makes good on thispromise.[See Tiede 1999; Gierasimczuk 2007; Gierasimczuk 2009 for earlierattempts.]

13 / 38

Natural Question

13 / 38

Natural Question

13 / 38

Overview

3 Experiments

4 Conclusion

14 / 38

Neural Networks

Source: Nielsen, “Neural Networks and Deep Learning”, Determination Press

15 / 38

Gradient Descent and Back-propogation

The learning framework will be a non-convex optimization problem.

A total loss function, which will be the mean of a ‘local’ errorfunction:

∑`(yi , yi )

∑`(NN(~θ, xi ), yi )

Navigate the landscape of weight space towards lower loss:

~θt+1 ← ~θt − α∇~θL

Back-propagation: calculate `(·, ·) after a forward-pass of thenetwork. ‘Propagate’ the error backwards through the network tocalculate the gradient.

16 / 38

∑`(yi , yi )

∑`(NN(~θ, xi ), yi )

~θt+1 ← ~θt − α∇~θL

16 / 38

∑`(yi , yi )

∑`(NN(~θ, xi ), yi )

~θt+1 ← ~θt − α∇~θL

16 / 38

Recurrent Neural Networks

Source: Olah, “Understanding LSTM Networks”, http://colah.github.io/posts/2015-08-Understanding-LSTMs/

17 / 38

Long Short-Term Memory

Introduced in Hochreiter and Schmidhuber 1997 to solve theexploding/vanishing gradients problem.

Source: Olah, “Understanding LSTM Networks”, http://colah.github.io/posts/2015-08-Understanding-LSTMs/

18 / 38

The Learning Task

Our data will be the following:

Input: 〈Q,M〉 pairs, presented sequentiallyOutput: T/F, depending on whether M ∈ Q

So, this will be a sequence classification task.

The loss function we minimize is cross-entropy. In our case, withy ∈ {0, 1}, very simple:

`(NN(~θ, x), y) = − ln(NN(~θ, x)y )

19 / 38

The Learning Task

`(NN(~θ, x), y) = − ln(NN(~θ, x)y )

19 / 38

The Learning Task

`(NN(~θ, x), y) = − ln(NN(~θ, x)y )

19 / 38

Data Generation

Algorithm 1 Data Generation Algorithm

Inputs: max len, num data, quants

data← []while len(data) < num data do

N ∼ Unif ([max len])Q ∼ Unif (quants)cur seq← Unif ({A ∩ B,A \ B,B \ A,M \ (A ∪ B)} ,N)if 〈Q, cur seq〉 /∈ data then

data.append(generate point(〈Q, cur seq,Q ∈ cur seq?〉))end if

end whilereturn shuffle(data)

20 / 38

Overview

3 Experiments

4 Conclusion

21 / 38

General Set-up

For each proposed semantic universal:

Choose a set of minimally different quantifiers that disagree withrespect to the universal

Run some number of trials of training an LSTM to learn thosequantifiers

Stop when:500 minibatch running mean total accuracy on test set is > 0.99; ormean probability assigned to correct truth-value on test set is > 0.99

Measure, for each quantifier, its convergence point:first i such that mean accuracy from i onward for Q is > 0.98

Code and Data Available At:

http://github.com/shanest/quantifier-rnn-learning

22 / 38

General Set-up

22 / 38

General Set-up

22 / 38

General Set-up

22 / 38

General Set-up

22 / 38

General Set-up

22 / 38

Experiment 1: Monotonicity

Quantifiers: ≥ 4, ≤ 4, = 4

23 / 38

Experiment 1: Monotonicity

Quantifiers: ≥ 4, ≤ 4, = 4

23 / 38

Monotonicity: Results

24 / 38

Monotonicity: Results

25 / 38

Monotonicity: Discussion

≥ 4 easier than = 4: X

≥ 4 easier than ≤ 4, and ≤ 4 not easier than = 4: ???

Upward monotone are generally cognitively easier than downward[Just and Carpenter 1971; Geurts 2003; Geurts and Slik 2005;Deschamps et al. 2015]= 4 is a conjunction of monotone quantifiers

A better measure of learning rate than convergence point?

26 / 38

Upward monotone are generally cognitively easier than downward[Just and Carpenter 1971; Geurts 2003; Geurts and Slik 2005;Deschamps et al. 2015]

= 4 is a conjunction of monotone quantifiers

26 / 38

Experiment 2: Quantity

Quantity Universal

Quantifiers: ≥ 3, first3

27 / 38

Experiment 2: Quantity

Quantity Universal

Quantifiers: ≥ 3, first3

27 / 38

Quantity: Results

28 / 38

Quantity: Results

29 / 38

Quantity: Discussion

Very promising initial results

Harder to learn an order-sensitive quantifier than one that is onlysensitive to quantity

30 / 38

Quantity: Discussion

Very promising initial results

Harder to learn an order-sensitive quantifier than one that is onlysensitive to quantity

30 / 38

Experiment 3: Conservativity

Quantifiers: nall, nonly[Motivated by Hunter and Lidz 2013]

31 / 38

Experiment 3: Conservativity

Quantifiers: nall, nonly[Motivated by Hunter and Lidz 2013]

31 / 38

Conservativity: Results

32 / 38

Conservativity: Results

33 / 38

Conservativity: Discussion

No way of ‘breaking the symmetry’ between A ∩ B and B \ A in thismodel

More boldly: conservativity as a syntactic/structural constraint, nota semantic universal[See Fox 2002; Sportiche 2005; Romoli 2015]

34 / 38

Conservativity: Discussion

No way of ‘breaking the symmetry’ between A ∩ B and B \ A in thismodel

More boldly: conservativity as a syntactic/structural constraint, nota semantic universal[See Fox 2002; Sportiche 2005; Romoli 2015]

34 / 38

Overview

3 Experiments

4 Conclusion

35 / 38

Towards Meeting the Challenge

Challenge: find a model of learnability on which quantifierssatisfying a universal are easier to learn than those that do not.

We showed how to train LSTM networks via backpropagation toverify quantifiers.

Initial experiments show that this setup may be able to meet thechallenge.

36 / 38

Future Work

More and larger experiments

Different models, including baselines for this task

More realistic visual input (see Sorodoc, Lazaridou, et al. 2016;Sorodoc, Pezzelle, et al. 2017)

Tools to ‘look inside’ the black box and see what the network isdoing

37 / 38

Future Work

37 / 38

Future Work

37 / 38

Future Work

37 / 38

The End

Thank you!

38 / 38

learnability and semantic universals

Documents