web viewthe current experiment utilizes a novel approach to exploring the ... representations and...

Strengthened Readings of Scalars

A Closer Look at Strengthened Readings of Scalars

Mandy Simons1 & Tessa Warren2

1Carnegie Mellon University, Dept f Philosophy

2University of Pittsburgh, Learning & Research Devpt Center and Dept of Psychology

1


Abstract

The majority of the extensive experimental and theoretical literature on scalar

strengthening assumes that the phenomenon is uniform across all types of scalars. The

experiment reported here contributes to the growing evidence against scalar uniformity,

while also exploring the suggestion of van Tiel at al. 2014 of the role of boundedness in

the observed variation. The current experiment utilizes a novel approach to exploring the

interpretation of scalars, and also investigates the content of strengthened interpretations.

Key words: language comprehension, scalar implicatures, experimental pragmatics, sentence processing

2


Introduction

Scalar terms include words such as some, cool, and like (v), which partition a semantic

scale. It has been argued that the semantics of scalar terms places a lower bound on their

interpretation (e.g. if Jane merely likes donuts, this falls below the lower bound of love)

but not an upper bound (e.g. if Jane loves donuts then we can still say that she likes them;

Horn, 1972, 1989). Nonetheless, sentences containing scalar terms are often taken to

imply an upper bound, giving rise to a strengthened interpretation. Examples are given in

(1)-(3) below.

(1) Some of John’s children are at home.Strengthened interpretation: Some of John’s children are at home, and at least one of his children is not at home.

(2) It was a cool day.Strengthened interpretation: The temperature was in the range characterizable as “cool” but not in the range characterizable as “cold”.

(3) Jane likes ice cream.Strengthened interpretation: Jane has positive feelings about ice cream, but not feelings that can be characterized by “love”

Standard accounts of scalar strengthening (e.g. Horn, 1972; Gazdar, 1979; Geurts 2010)

all depend on the following idea derived from Grice (1967): Speakers are expected to be

as informative as they can, consistent with current conversational goals and their current

information. The assumption that a speaker has been as informative as she can leads to

the inference that certain stronger alternatives to the content the speaker has expressed

are considered, by her, to be not assertible. In the simplest case, where the speaker is

assumed to be fully informed, it is assumed that this is because the stronger alternatives

are not believed true by the speaker. More recent accounts (e.g. Fox, 2007; Chierchia,

3


Fox, & Spector, 2012) grammaticize this kind of reasoning, but are still driven by

considerations of informativity.

A central debate about scalar strengthening in the current literature is whether it

involves ad hoc pragmatic reasoning on the part of the interpreter, or is a relatively

“shallow” phenomenon, involving lexically-given alternatives to specific scalar forms,

possibly encapsulated in the grammar. One question that bears on this debate is the

extent to which scalar inferences are variable. If scalar inference is pragmatic, we would

expect it to be sensitive both to features of context and to lexical pragmatic differences

among scalar terms. On the other hand, if scalar inference is simply dependent on the

availability of lexical alternatives, we might expect more homogeneity.

Two types of contextual variability have been widely acknowledged. First,

context influences whether or not a particular occurrence (token) of a scalar will be

strengthened: scalar effects are pragmatically canceled in some environments. Second,

the grainedness of strengthening is contextually variable. For example, if we care only

about whether or not all students passed a test, an utterance of Some students passed will

trigger the implication “not all,” but perhaps not the implication “not most”. However, if

we care about exactly how many passed, the same utterance might carry the “not most”

implication.

The current paper investigates a different kind of variability: the extent to which

strengthening varies across scalar types: whether, for example, the term cool is

strengthened to “not cold” at different rates than possibly is strengthened to “not

certainly”. This point is tangential to the question of contextual cancelation of scalar

4


effects or modification of grainedness, which, according to standard views, would affect

all scalars equally.

Current theories of scalar strengthening do not predict variability across types.

Perhaps for this reason, the experimental literature on scalars has been dominated by the

assumption that all scalar items behave in the same way, and therefore that findings

pertaining to any scalar item can safely be generalized to all scalars (cf. Doran, Baker,

McNabb, Larson, & Ward, 2009; van Tiel, van Miltenburg, Zevakhina, & Guerts, 2014).

Consistent with this, the experimental literature on scalar implicatures is almost entirely

based on only two scalar exemplars: the quantifiers (typically some), and the connective

or (van Tiel et al., 2014).

Another prediction of current theories of scalar strengthening is that strengthened

interpretations rule out both (contextually available) non-maximal values on the relevant

semantic scale and maximal values. Most existing experimental work on scalars

investigates whether or not sentences containing weak scalars are understood to be

consistent with the top of the associated semantic scale (e.g., Subjects are asked questions

like: “Jane says I ate some of the chips; does she means that she did not eat all of the

chips?”) But an additional question is whether, and to what extent, weak scalars are

understood to be consistent with stronger but non-maximal values on the scale (i.e. if

some implies “not all,” does it equally imply “not most”?) The current study contributes

to the very small body of work investigating this (e.g., Zevakhina, 2012), or the related

question of whether non-minimal scalars trigger strengthening at the same rates as

minimal ones (e.g. Beltrama & Ziang, 2012).The aims of current experiment are to: (1)

investigate whether there is variation among scalar items in the degree to which

5


unembedded occurrences (i.e. main clause occurrences not in the scope of any operator)

of these items give rise to scalar inferences, and (2) explore the content of strengthened

interpretations.

Two previous papers have reported experiments investigating the homogeneity

assumption. Doran et al. (2009) tested a range of scalars using a task in which

participants were instructed to take the perspective of a character named Literal Lucy,

who always interpreted statements literally, and from that perspective to judge the truth

value of a statement made by a second character. Doran et al. found considerable

variation in strengthening across the set of scalars they tested. However, the complexity

of Doran et al.’s perspective-taking task raises concerns about their findings, as does the

fact that the judgments they gathered were their participants’ guesses about the

interpretations of a third party.

Van Tiel et al. (2014) tested a wide range of scalars in two questionnaires.

Participants made inferences about a speaker’s mental model based on his or her use of

one of two contrasting scalars, as in the following example:

(4) John says: “She is intelligent.” Would you conclude from this that, according to John, she is brilliant?

Participants answered yes or no. In a second version of the survey, the statements to be

judged had more specific predicates and full noun phrases instead of pronouns. Both

surveys showed similar variation in how likely different scalars were to be strengthened.

Further experiments investigated what factors might account for the observed variation,

and found two properties that made a significant contribution: semantic distance and

6


boundedness. The current experiment follows up on van Tiel et al., using a different

method and a different conception of scalar inference.

Van Tiel et al. adopted an approach consistent with much of the current literature,

ultimately informed by Horn (1972) and Gazdar (1979), according to which scalar

inference is driven by the interpreter’s knowledge of a lexical scale (Horn scale)

associated with a given scalar item. A lexical scale is an ordered n-tuple of lexical items

standing in asymmetrical entailment relations. The lexical scales that would be invoked

to explain the inferences in (1)-(3) above are shown in (5)-(7) below:

(5) <all, most, many, a few, some>

(6) <frigid, cold, cool>

(7) <like, love>

Gazdar (1979) takes the elements in these scales to be semantic representations and not

expressions of English. He argues that this position is required for consistency with the

underlying Gricean idea, because “to read off [scalar implicatures] from the actual lexical

items given in the surface structure would be tantamount to treating them as conventional

implicatures” (p.56).

This view of scales has, though, largely disappeared from the literature, with the

result that scalar inferences are typically seen as inferences involving formally

determinable alternative sentences (e.g. Katzir, 2007). On this view, utterance of a

sentence S containing a weak scalar item implies the negation of sentences derivable

7


from S by replacing the weak scalar item with logically stronger ones (e.g. by replacing

cool with cold).

In contrast, we take scalar inference to be driven by reasoning about the

underlying semantic scale associated with scalar expressions: e.g.

quantitative/proportional scales associated with the quantifiers; scales of temperature

associated with temperature expressions; etc. We therefore take scalar strengthening to be

an inference that guides the interpreter’s construction of a mental model of the content

expressed by the speaker. We view lexical scales merely as realizations of the underlying

semantic scale over which reasoning takes place.

We nonetheless recognize the importance of the expressions in these associated

scales as vehicles for increasing the salience of alternative portions of the underlying

scale, and it is for this reason that our experiment was explicitly designed to avoid the use

of scale-mates in probing participants’ interpretations of scalars. The design was

motivated by Geurts and Pouscoulous (2009) (see also Pouscoulous 2006; Geurts, 2009;

Doran et al., 2009), who found that experimental designs in which subjects are presented

with the scalar alternatives of a target scalar result in increased reports of strengthened

interpretations. Van Tiel et al. (2014) used such a design and justified it on the grounds

that a general heightening of scalar effects should not affect the relative frequency of

strengthening across scalars. However, it is unknown whether this heightening effect is

uniform across scalars. One contribution of the current paper is to offer a new method for

investigating scalar implicatures without presenting scalar alternatives.

8


The current study was designed to probe maximally natural scalar interpretation.

To this end, weak scalars were embedded in the relatively rich context of 3-6 sentence

paragraphs. Participants read the paragraphs and after each one judged the consistency or

inconsistency of each of seven sentences with the paragraph. To minimize the likelihood

that participants would focus on scalar interpretation, four of these sentences were fillers

that did not involve scalars. The remaining sentences probed scalar interpretation in a

novel way, avoiding the use of explicit scalar alternatives. Instead, these probe sentences

contained descriptions of events or states of affairs consistent with different readings of

the scalar sentences from the passage. For example, one passage contained the sentence

She noticed that many of her pencils were chewed on. To test whether many is judged

consistent with all without offering this as an explicit alternative, we asked participants to

judge whether the sentence 100% of her pencils were chewed on was consistent with the

passage. (Note that the phrase “100%” is not itself a scale mate of the term some on

standard views: first, scale mates must be equally lexicalized; some and “100%” are not;

also, they belong to different syntactic categories: some is a determiner, 100% is not.)

Although our method avoids explicit comparisons between scale mates, it may

encourage semantic/pragmatic reasoning about the meaning of the scalar term in its

context. We take this to be a benefit. Our goal is, in part, to investigate whether the

standard mechanistic approaches to scalar inferences – taking the scalar alternatives to be

linguistically given and the process of strengthening to be automatically triggered – are

consistent with ordinary interpretation, which happens in many different contexts. If

strengthening is affected (in ways going beyond granularity) by contextually induced

reasoning, this is important to take into account in our models of the process.

9


Experiment

Method

Participants

Forty-three native American-English speaking undergraduates from Carnegie

Mellon University received $8 each for completing the questionnaire.

Design and Stimuli

The experiment had a 9x3 repeated measures within-subjects design. The first

factor was the scalar. We tested 8 scalar words, each one associated with a non-maximal

point on an underlying scale. Target words were: cool, warm, good, like, many, some,

possible, and think. Possible was tested twice: once for a strengthening that excluded the

top of its scale (equivalent to the meaning of possible but not certain) and the other for a

strengthening that excluded a higher but not maximal point on its scale (roughly the

meaning of possible but not probable).

The target scalars were divided into three triads: {cool, good, possible1}, {many,

possible2, think}, {some, like, warm}. For each triad, we constructed twelve naturalistic

paragraphs consisting of 3-6 sentences, and included one instance of each scalar in each

paragraph. Every paragraph was about a different situation or scenario, and the scalars

were used with a variety of different argument types. For example, the verb like appeared

10


with direct objects that were activities, food, other people, etc. This variety allowed us to

sample strengthening across a range of naturalistic uses.

The second factor was the relation between the content of the probe sentence and

the semantic scale underlying the target scalar. For each scalar, we devised ways to

invoke a region of the underlying semantic scale clearly consistent with the bounded

(strengthened) reading of the scalar, and parallel ways to invoke a region of the scale

above that bound. Probe sentences were designed to test in two different ways whether

the target scalar was judged compatible with this higher region of the scale.

Unstrengthened-compatible (UNSTR) sentences described the relevant feature of the

passage as being above the expected bound, i.e. invoked a point or region at or close to

the top of the underlying semantic scale. These sentences would be judged consistent

with the passage only if the scalar item were given an unstrengthened interpretation.

Crucially, the unstrengthened-compatible prompts were simple, non-modal statements

(e.g. There was a 100% chance that Sally would run into Steven at the pool.) Sentences

in the Range condition (RNG) made reference to a range of values on the underlying

semantic scale, including points clearly consistent with the strengthened reading and

points lying above the upper bound induced by strengthening. The Range sentences

always included the words anywhere between modifying the given range so as to elicit a

response of “compatible with passage” only if all values on the identified range were

compatible.(e.g. The temperature was anywhere between 32 and 60 degrees Fahrenheit,

or The chance of rain was anywhere between 30% and 100%). Because the Range

sentences included the values from the Unstrengthened-Compatible condition, we

expected that these conditions would pattern together.

11


In the third, Strengthened-compatible (STR), condition, sentences described the

relevant feature of the passage as being at a point or small range of values clearly within

the strengthened interpretation of the scalar. These sentences were expected to be judged

consistent with the passage on any reading of the target scalar (as strengthened-

compatible sentences would also be compatible with the unstrengthened reading of the

target item). See Table 1 for an example item from the cool, good, possible/certain triad.

(The complete materials are available at

http://www.cmu.edu/dietrich/philosophy/people/faculty/core-faculty/simons.html.)

** TABLE 1 ABOUT HERE **

Each of the 36 experimental paragraphs was followed by an Unstrengthened-

Compatible sentence for one scalar, a Strengthened-Compatible sentence for a different

scalar, and Range sentence for the remaining scalar, as well as four filler sentences. Three

counter-balancing lists were created such that the assignment of Unstrengthened-

Compatible, Strengthened-Compatible, and Range sentences rotated in a Latin Square

design across the three scalars within each paragraph. The order in which scalars

appeared in a paragraph was varied. Paragraph presentation order was randomized, as

was the ordering of the statements following each passage. Filler sentences ranged in

their consistency with the paragraph, with approximately half of the fillers consistent and

half inconsistent. Some fillers were easy to judge as consistent or inconsistent with the

paragraph, others less so.

12


Apparatus

The questionnaire used Qualtrics software and was administered via the web.

Procedure

Instructions read as follows: for each statement following the passage, “decide

whether or not the statement is consistent with what is, for you, the most natural way

of understanding the passage” (bold in original). There followed one example passage,

example statements, and example consistency judgments with explanations. Each

subsequent questionnaire page had a passage followed by seven statements. Participants

judged the consistency of each statement with the passage by pressing either an icon

labeled “consistent with passage”, or one labeled “not consistent with passage.” After

responding to all statements, participants clicked a “continue” button to move to the next

passage. Participants could only advance if every statement had been responded to, and

after moving on, it was not possible to return to an earlier passage. Participants were

encouraged to complete the questionnaire in one sitting; however, after accessing the

questionnaire they were able to save it and complete it later (but not to return to questions

that had already been completed).

13


Results

For the purposes of analysis, “consistent with passage” answers were coded as 0

and “inconsistent with passage” answers were coded as 1. Performance on the 96 filler

sentences that the authors judged to be most clearly consistent or inconsistent with the

passages was used to verify that participants understood and attended to the task. Average

performance on these fillers was 80% correct, and the minimum performance was 68%

correct. No participants were clear outliers. Figure 1 reports mean proportion of sentences

in each condition that were judged to be inconsistent with the passage.

**FIGURE 1 ABOUT HERE **

Data were analyzed using both linear mixed effects logit models (Baayen, 2008).

The models were run in R (R Development Core Team, 2013; ver 3.0.1) using the lmer()

function in the lme4 package (Bates, Maechler, & Bolker, 2013; ver. .999999-2). All

models included crossed random intercepts for participants and items; items were defined

as triads of strengthening conditions that queried the same scalar. Following Barr, Levy,

Scheepers, and Tily (2013), we included as much random slope structure as our models

would accept yet still converge. When full models would not converge, we dropped the

random slopes that captured the least variance until the model did converge.

Our first model investigated the effects of strengthening condition. Strengthening

condition was a fixed factor, and random slopes for strengthening condition and scalar

14


were included for participants, and random slopes for strengthening condition for items.

Strengthening condition was treatment coded with the Strengthened-Compatible

condition as the reference level, because it should be acceptable under any reading of the

target scalar, and as such provides a baseline to compare to the other conditions.

Participants were reliably more likely to judge the Unstrengthened-Compatible condition

inconsistent than the Strengthened-Compatible condition (estimated ß= 3.28, z=13.22,

p<.001), but there was no difference between the Range and Strengthened-Compatible

conditions (estimated ß=-.12, z=.18, p=.51).

Our second set of models tested two factors that might contribute to variation in

rates of strengthening across scalars. Following van Tiel et al. (2014), the first was the

boundedness of the scale. The second was whether the Unstrengthened-Compatible

condition tested the maximal point of a bounded scale. Note that van Tiel et al. did not

test this factor, as their conceptualization of scalars doesn’t dissociate between the top of

the scale and the stronger scalar alternative. Think/know was left out of these analyses,

because though van Tiel et al. categorized it as bounded, its categorization is not

straightforward. Cool, warm, like, and good were coded as being on unbounded scales

(-.5), and possible/certain, some, many, and possible/probable were coded as being on

bounded scales (.5). For tests investigating the factor of testing the maximal point of a

bounded scale, possible/probable was switched to the -.5 group. Critically, differences in

rates of strengthening across scalars should be evident in the Unstrengthened-compatible

conditions but not the Strengthened-compatible conditions, so this factor was included in

the models with Unstrengthened-compatible conditions coded as .5 and Strengthened-

Compatible conditions coded as -.5. However, the logic of using the Strengthened-

15


compatible condition as a baseline depends on that condition being judged consistent

with the passage, and for some items this was rarely the case. We therefore limited

analyses to the 80 items that were judged inconsistent in the Strengthened-compatible

condition on less than 30% of observations1. These models had maximal random effects

structure. A model testing 2x2 fixed factors of boundedness and strengthening condition

showed a reliable interaction (estimated ß=.96, z=.48, p=.048), such that boundedness did

not affect judgments in the Strengthened-compatible condition, but scalars from bounded

scales were more likely to be judged inconsistent in the Unstrengthened-compatible

condition than scalars from unbounded scales. There was also a main effect of

strengthening condition (estimated ß=4.31, z=14.50, p<.001), with Unstrengthened-

compatible conditions more likely to be judged inconsistent than Strengthened-

compatible conditions. An almost-identical model testing the effect of being at the top of

a bounded scale and strengthening condition showed a similar, but stronger, interaction

(estimated ß=1.58, z=2.71, p=.007), and a similar main effect of strengthening condition

(estimated ß=4.63, z=13.84, p<.001).

Seven scalars that had previously been tested in van Tiel et al. (2014) were tested

in the current experiment (assuming that think/know and believe/know are equivalent). To

test the robustness of a scalar’s relative rate of strengthening across the two studies, we

computed Spearman’s rank correlation by ranking these scalars from least to most often

strengthened in each study and computing the correlation between these rankings. The

correlation coefficient was .86.

1 The choice of 30% as a cutoff was fundamentally arbitrary, but it balanced the need to not eliminate too much data, yet keep only items for which the Strengthened-compatible condition was an appropriate baseline.

16


Two final analyses compared strengthening rates for individual scalars: possible

when the implicit comparison was probable versus when it was certain, and some versus

many, both of which contrasted with the meaning of all. Both models included maximal

random structure and analysis was again limited to items that were judged inconsistent in

the Strengthened-compatible condition on less than 30% of observations. The first model

tested the interaction of sum-coded fixed effects of possible/probable vs. possible/certain

and Unstrengthened-Compatible vs. Strengthened-Compatible. It revealed a reliable main

effect of strengthening condition, (estimated ß=5.10, z=11.07, p<.001), with the

Unstrengthened-Compatible condition judged inconsistent more often, as well as a

reliable interaction (estimated ß=2.54, z=3.39, p<.001), such that there was a larger effect

of strengthening condition when possible was implicitly compared to certain than to

probable. The second model was identical, but replaced the possible factor with a sum-

coded comparison of some vs. many. This model revealed a main effect of strengthening

condition (estimated ß=4.85, z=10.85, p<.001), but no interaction.

Discussion

The current findings advance our understanding of the factors influencing scalar

diversity and the content of strengthened interpretations. Consistent with findings

reported in Doran et al. (2009) and van Tiel et al. (2014), participants did not uniformly

strengthen across all scalars. Sentences containing the scalars good and think were almost

30% less likely to be strengthened than ones with the scalars many and possible. The

current experiment used richer and more natural stimuli than previous studies, used a

wider variety of contexts, eliminated the need for perspective switching by experiment

participants, included many fillers, and avoided using the critical scalar terms in

17


questions, yet the pattern of strengthening across scalars was very similar to the one van

Tiel et al. (2014) found. This suggests that a scalar’s relative rate of strengthening is quite

robust. This suggests also that the passages, although constructed by the experimenters,

did not introduce any unnoticed bias towards strengthened/unstrengthened

interpretations.

The current study also advances our understanding of what factors contribute to

the diversity of scalar strengthening. Consistent with van Tiel et al. (2014), the current

study found that weak scalar terms were more likely to be strengthened if their

underlying scale is bounded than if it is unbounded. However, this effect was primarily

driven by the three scalars on bounded scales for which the Unstrengthened-compatible

condition tested the maximal point on the scale. When possible, which is on a bounded

scale, was tested against Unstrengthened-compatible values that implemented probable,

it patterned with scalars from unbounded scales and differed reliably from the condition

in which possible was contrasted with the meaning of certain. This suggests that the

critical factor may not simply be the boundedness of the underlying scale, but that the

stronger alternative instantiates the critical bound. In van Tiel et al., this distinction could

not be made because there was no underlying scale; the scale was defined by the lexical

scalar alternatives. By definition in their experiment, then, the stronger scalar on a

bounded scale always instantiated a bound. Note that the lack of difference in

strengthening between some and many suggests that it is the critical bound that is

important, not the position of the weak scalar on the scale.

The current findings provide two sources of evidence counter to the most current

theories’ prediction that a strengthened interpretation of a sentence containing a scalar

18


term will be inconsistent with any stronger scalar alternative. The first comes from the

two contrasts of possible. In one set of items, the Unstrengthened-Compatible sentences

for possible were consistent with the meaning of probable (a high but non-maximal

region on the probability scale); For example, one passage contained the sentence It was

possible that the store was still open. Participants seeing the Unstrengthened-Compatible

sentence subsequently judged the consistency of:

(8) It was 90-95% possible that the store was still open.

In another set of items, Unstrengthened-Compatible sentences were consistent with

certain (the maximal region on that scale). For example, one passage contained the

sentence It was possible it might rain, and participants seeing the Unstrengthened-

Compatible sentence subsequently judged the consistency of:

(9) The chance of rain was 100%

A reliable interaction between the contrast of possible and strengthening condition

indicates that possible is typically strengthened so as to exclude certain, but less typically

strengthened to exclude probable.2 Note that neither the passages nor the test sentences

contained the word certain or explicitly contrasted probability and certainty, making it

unlikely that this effect was related to a difference in salience between what is probable

versus certain.

2 An anonymous reviewer suggests that in the terminology used by forecasters, a chance of rain greater than 50% counts as probable. All that matters for our purposes is that probabilities in the 90-95% range still count as probable (and not as certain). Note that by using a high range for probable, we increase the likelihood of rejection of this condition.

19


The second source of evidence challenging standard assumptions about

strengthening is the finding that Range-condition sentences patterned with Strengthened-

Compatible sentences rather than with Unstrengthened-Compatible sentences. To see the

significance of this result, consider the example of a passage containing the sentence The

baby’s mother had sung him many of his favorite songs. For this passage, the sentences

were:

(10) (UNSTR-Comp) The baby’s mother had sung him 100% of his favorite songs.

(STR-Comp) The baby’s mother had sung him between 70% - 90% of his favorite songs.

(RNG) The baby’s mother had sung him anywhere between 70% -100% of his favorite songs.

The fact that participants tended to reject Unstrengthened-Compatible sentences as

inconsistent but accept Strengthened-Compatible sentences suggests that their

interpretations of scalar terms exclude the higher points on the scale tested in the

Unstrengthened-Compatible conditions. However, if this exclusion were total, then

participants should have rejected Range and Unstrengthened-Compatible sentences at

similar rates. Instead they accepted Range and Strengthened-Compatible sentences at

similar rates. The high acceptability of Range sentences suggests that participants did not

categorically reject the possibility that the value is at the top of the scale, but rather

assigned it a very low probability. Note that this construal of the results assumes that

when participants accept the Range sentence as consistent with the passage, they are

indicating that none of the values within the range, including the maximal value, are

absolutely inconsistent with the passage. The reason we think participants are doing this

20


is that the range given in the Range sentences is always accompanied by the words

anywhere between, which emphasizes the non-zero possibility of every point within the

range. It is plausible that participants take some values within the range to be more likely

than others, but as long as they consider the highest point of the range to have a non-zero

probability, the similar patterning of the Strengthened-compatible and Range conditions

is highly informative and not predicted by standard models.

One aspect of the current study that could be considered either a strength or a

weakness is that it grappled with the range of variability that characterizes scalars in

natural language. We tested each scalar across multiple rich contexts, and found

considerable variability. For example, the scalars some, many, and cool had relatively

high rates of rejection for Strengthened-Compatible sentences. This may have been

because we attempted to maintain consistency in the points of the scales tested in the

different strengthening conditions, e.g. we set one range of temperatures to be consistent

with cool, and another to implement cold, and used those across the experiment.

However, there was contextual variability in where the most appropriate value for the

weak scalar fell on the underlying semantic scale, and it sometimes fell outside our set

ranges. Given this variability, one might be concerned that inadvertent differences in

contextual effects on strengthening within items (see Degen, 2015) might have affected

the results. However, the fact that patterns of scalar strengthening were so similar across

the current study and van Tiel et al. (2014) suggests that there are robust patterns of

strengthening for individual scalars regardless of whether the context is minimal or rich

and variable.

21


Nonetheless, given the extensive evidence of the influence of fine features of

context on interpretation, we consider one of the most valuable contributions of this paper

to be the innovative method for naturalistic investigation of scalar interpretation.

22


Acknowledgements

We would like to thank Eric Kummerfeld for assistance in creating the Qualtrics

questionnaire, Scott Fraundorf for statistical advice, and the University of Pittsburgh

Reading and Language Group and the audience of the 2014 CUNY human sentence

processing conference for helpful comments.

23


Tables

Sally went to the pool around 4 o’clock. She enjoyed swimming at the end of the day: she was a good swimmer and she loved how the swim left her feeling cool and refreshed. And although she wouldn’t have admitted it to anyone, she went to the pool in part because it was possible she would run into Steven there.

cool/cold

U: After swimming, Sally would be blue-lipped and shivering.

S: After swimming, Sally would not be hot, but would not be blue-lipped and shivering.

R: After swimming, Sally would be anywhere from comfortably not-hot to blue-lipped and shivering.

good/great

U: On a scale of 1-10, where 1 is the level of an absolute beginner swimmer and 10 is the level of a champion competitive swimmer, Sally ranks a 10.

S: On a scale of 1-10, where 1 is the level of an absolute beginner swimmer and 10 is the level of a champion competitive swimmer, Sally ranks a 7 or 8.

R: On a scale of 1-10, where 1 is the level of an absolute beginner swimmer and 10 is the level of a champion competitive swimmer, Sally could rank anywhere between 7 and 10.

possible/certain

U: There was a 100% chance that Sally would run into Steven at the pool.

S: There was a 30%-70% chance that Sally would run into Steven at the pool.

R: The chances of Sally running into Steven at the pool were anywhere between 30%-100%.

Table 1. Example item

24


Figures

Figure 1. Proportion of sentences judged inconsistent for each scalar in each

strengthening condition. Error bars represent standard errors of the mean from

ANOVAs.

25


References

Baayen, R.H. 2008: Analyzing linguistic data: A practical introduction to statistics using

R. Cambridge, England: Cambridge University Press.

Barr, D.J., Levy, R., Scheepers, C., & Tily, H.J. 2013: Random effects structure for

confirmatory hypothesis testing: Keep it maximal. Journal of Memory and

Language, 68(3), 255-278.

Bates, D. M., Maechler, M. & Bolker, B. (2013). lme4: Linear mixed effects models using s4

classes. R package version 0.999999.2. http://CRAN.R-project.org/package=lme4.

Bott, L., & Noveck, I. A. 2004: Some utterances are underinformative: The onset and

time course of scalar inferences. Journal of Memory and Language, 51(3), 437-

457.

Carston, R. 1988: Implicature, explicature and truth-theoretic semantics. In R. Kempson

(ed.) Mental Representation: The Interface between Language and Reality.

Cambridge: Cambridge University Press, 155-81.

Chierchia, G. 2004: Scalar Implicatures, Polarity Phenomena, and the Syntax/Pragmatics

Interface. In A. Belleti (ed.), Structures and Beyond. New York: Oxford

University Press.

Chierchia, G., Fox, D., & Spector, B. 2010: The grammatical view of scalar implicatures

and the relationship between semantics and pragmatics. In C., Maienborn, Klaus

von Heusinger and Paul Portner (eds.), Semantics: An International Handbook of

Natural Language Meaning. Berlin: Mouton de Gruyter.

26


Degen, J., Tanenhaus, M. K. 2014: Processing scalar implicature: A constraint-based

approach. Cognitive Science.

Doran, R., Baker, R.E., McNabb, Y, Larson, M., & Ward, G. 2009: On the non-unified

nature of scalar implicature: An empirical investigation. International Review of

Pragmatics, 1, 211-248.

Fox, D. 2007: Free Choice Disjunction and the Theory of Scalar Implicatures. In U.

Sauerland & P. Stateva (eds.), Presupposition and Implicature in Compositional

Semantics. Houndsmills, Basingstoke, Hampshire: Palgrave Macmillan.

Gazdar, G.1979: Pragmatics: Implicature, Presupposition, and Logical Form. New

York: Academic Press.

Grodner, D.J. & Russell, B. 2013. Evidence for a rational probabilistic account of

Gricean implicatures. Poster presented at the 26th annual CUNY conference on

human sentence processing. Columbia, SC.

Guerts, B. 2010: Quantity Implicatures. Cambridge: Cambridge University Press.

Geurts, B. & Pouscoulos, N. 2009: Embedded Implicatures?!? Semantics and Pragmatics

2(4), pp.1-24.

Horn, L. R. 1972. On the semantic properties of the logical operators in English. Ph.D.

dissertation, University of California at Los Angeles, Los Angeles, California.

Horn, L. R. A Natural History of Negation. Chicago: University of Chicago Press.

Noveck, I. A. 2000: When children are more logical than adults: Experimental

investigations of scalar implicature. Cognition, 78(2), 165-188.

Papafragou, A., & Musolino, J. 2003: Scalar implicatures: experiments at the semantics–

pragmatics interface. Cognition, 86(3), 253-282.

27


R Development Core Team. 2013: R: A language and Environment for statistical

computing. R Foundation for Statistical Computing, Vienna, Austria. URL

http://www.R-project.org.

van Tiel, B., E. van Miltenburg, N. Zevakhina, & B. Guerts 2014: Scalar Diversity.

Journal of Semantics 0, 1-39. doi:10.1093/jos/ffu017

28

http://www.R-project.org/

web viewthe current experiment utilizes a novel approach to exploring the ... representations and...

Documents