web viewthe current experiment utilizes a novel approach to exploring the ... representations and...
TRANSCRIPT
Strengthened Readings of Scalars
A Closer Look at Strengthened Readings of Scalars
Mandy Simons1 & Tessa Warren2
1Carnegie Mellon University, Dept f Philosophy
2University of Pittsburgh, Learning & Research Devpt Center and Dept of Psychology
1
Strengthened Readings of Scalars
Abstract
The majority of the extensive experimental and theoretical literature on scalar
strengthening assumes that the phenomenon is uniform across all types of scalars. The
experiment reported here contributes to the growing evidence against scalar uniformity,
while also exploring the suggestion of van Tiel at al. 2014 of the role of boundedness in
the observed variation. The current experiment utilizes a novel approach to exploring the
interpretation of scalars, and also investigates the content of strengthened interpretations.
Key words: language comprehension, scalar implicatures, experimental pragmatics, sentence processing
2
Strengthened Readings of Scalars
Introduction
Scalar terms include words such as some, cool, and like (v), which partition a semantic
scale. It has been argued that the semantics of scalar terms places a lower bound on their
interpretation (e.g. if Jane merely likes donuts, this falls below the lower bound of love)
but not an upper bound (e.g. if Jane loves donuts then we can still say that she likes them;
Horn, 1972, 1989). Nonetheless, sentences containing scalar terms are often taken to
imply an upper bound, giving rise to a strengthened interpretation. Examples are given in
(1)-(3) below.
(1) Some of John’s children are at home.Strengthened interpretation: Some of John’s children are at home, and at least one of his children is not at home.
(2) It was a cool day.Strengthened interpretation: The temperature was in the range characterizable as “cool” but not in the range characterizable as “cold”.
(3) Jane likes ice cream.Strengthened interpretation: Jane has positive feelings about ice cream, but not feelings that can be characterized by “love”
Standard accounts of scalar strengthening (e.g. Horn, 1972; Gazdar, 1979; Geurts 2010)
all depend on the following idea derived from Grice (1967): Speakers are expected to be
as informative as they can, consistent with current conversational goals and their current
information. The assumption that a speaker has been as informative as she can leads to
the inference that certain stronger alternatives to the content the speaker has expressed
are considered, by her, to be not assertible. In the simplest case, where the speaker is
assumed to be fully informed, it is assumed that this is because the stronger alternatives
are not believed true by the speaker. More recent accounts (e.g. Fox, 2007; Chierchia,
3
Strengthened Readings of Scalars
Fox, & Spector, 2012) grammaticize this kind of reasoning, but are still driven by
considerations of informativity.
A central debate about scalar strengthening in the current literature is whether it
involves ad hoc pragmatic reasoning on the part of the interpreter, or is a relatively
“shallow” phenomenon, involving lexically-given alternatives to specific scalar forms,
possibly encapsulated in the grammar. One question that bears on this debate is the
extent to which scalar inferences are variable. If scalar inference is pragmatic, we would
expect it to be sensitive both to features of context and to lexical pragmatic differences
among scalar terms. On the other hand, if scalar inference is simply dependent on the
availability of lexical alternatives, we might expect more homogeneity.
Two types of contextual variability have been widely acknowledged. First,
context influences whether or not a particular occurrence (token) of a scalar will be
strengthened: scalar effects are pragmatically canceled in some environments. Second,
the grainedness of strengthening is contextually variable. For example, if we care only
about whether or not all students passed a test, an utterance of Some students passed will
trigger the implication “not all,” but perhaps not the implication “not most”. However, if
we care about exactly how many passed, the same utterance might carry the “not most”
implication.
The current paper investigates a different kind of variability: the extent to which
strengthening varies across scalar types: whether, for example, the term cool is
strengthened to “not cold” at different rates than possibly is strengthened to “not
certainly”. This point is tangential to the question of contextual cancelation of scalar
4
Strengthened Readings of Scalars
effects or modification of grainedness, which, according to standard views, would affect
all scalars equally.
Current theories of scalar strengthening do not predict variability across types.
Perhaps for this reason, the experimental literature on scalars has been dominated by the
assumption that all scalar items behave in the same way, and therefore that findings
pertaining to any scalar item can safely be generalized to all scalars (cf. Doran, Baker,
McNabb, Larson, & Ward, 2009; van Tiel, van Miltenburg, Zevakhina, & Guerts, 2014).
Consistent with this, the experimental literature on scalar implicatures is almost entirely
based on only two scalar exemplars: the quantifiers (typically some), and the connective
or (van Tiel et al., 2014).
Another prediction of current theories of scalar strengthening is that strengthened
interpretations rule out both (contextually available) non-maximal values on the relevant
semantic scale and maximal values. Most existing experimental work on scalars
investigates whether or not sentences containing weak scalars are understood to be
consistent with the top of the associated semantic scale (e.g., Subjects are asked questions
like: “Jane says I ate some of the chips; does she means that she did not eat all of the
chips?”) But an additional question is whether, and to what extent, weak scalars are
understood to be consistent with stronger but non-maximal values on the scale (i.e. if
some implies “not all,” does it equally imply “not most”?) The current study contributes
to the very small body of work investigating this (e.g., Zevakhina, 2012), or the related
question of whether non-minimal scalars trigger strengthening at the same rates as
minimal ones (e.g. Beltrama & Ziang, 2012).The aims of current experiment are to: (1)
investigate whether there is variation among scalar items in the degree to which
5
Strengthened Readings of Scalars
unembedded occurrences (i.e. main clause occurrences not in the scope of any operator)
of these items give rise to scalar inferences, and (2) explore the content of strengthened
interpretations.
Two previous papers have reported experiments investigating the homogeneity
assumption. Doran et al. (2009) tested a range of scalars using a task in which
participants were instructed to take the perspective of a character named Literal Lucy,
who always interpreted statements literally, and from that perspective to judge the truth
value of a statement made by a second character. Doran et al. found considerable
variation in strengthening across the set of scalars they tested. However, the complexity
of Doran et al.’s perspective-taking task raises concerns about their findings, as does the
fact that the judgments they gathered were their participants’ guesses about the
interpretations of a third party.
Van Tiel et al. (2014) tested a wide range of scalars in two questionnaires.
Participants made inferences about a speaker’s mental model based on his or her use of
one of two contrasting scalars, as in the following example:
(4) John says: “She is intelligent.” Would you conclude from this that, according to John, she is brilliant?
Participants answered yes or no. In a second version of the survey, the statements to be
judged had more specific predicates and full noun phrases instead of pronouns. Both
surveys showed similar variation in how likely different scalars were to be strengthened.
Further experiments investigated what factors might account for the observed variation,
and found two properties that made a significant contribution: semantic distance and
6
Strengthened Readings of Scalars
boundedness. The current experiment follows up on van Tiel et al., using a different
method and a different conception of scalar inference.
Van Tiel et al. adopted an approach consistent with much of the current literature,
ultimately informed by Horn (1972) and Gazdar (1979), according to which scalar
inference is driven by the interpreter’s knowledge of a lexical scale (Horn scale)
associated with a given scalar item. A lexical scale is an ordered n-tuple of lexical items
standing in asymmetrical entailment relations. The lexical scales that would be invoked
to explain the inferences in (1)-(3) above are shown in (5)-(7) below:
(5) <all, most, many, a few, some>
(6) <frigid, cold, cool>
(7) <like, love>
Gazdar (1979) takes the elements in these scales to be semantic representations and not
expressions of English. He argues that this position is required for consistency with the
underlying Gricean idea, because “to read off [scalar implicatures] from the actual lexical
items given in the surface structure would be tantamount to treating them as conventional
implicatures” (p.56).
This view of scales has, though, largely disappeared from the literature, with the
result that scalar inferences are typically seen as inferences involving formally
determinable alternative sentences (e.g. Katzir, 2007). On this view, utterance of a
sentence S containing a weak scalar item implies the negation of sentences derivable
7
Strengthened Readings of Scalars
from S by replacing the weak scalar item with logically stronger ones (e.g. by replacing
cool with cold).
In contrast, we take scalar inference to be driven by reasoning about the
underlying semantic scale associated with scalar expressions: e.g.
quantitative/proportional scales associated with the quantifiers; scales of temperature
associated with temperature expressions; etc. We therefore take scalar strengthening to be
an inference that guides the interpreter’s construction of a mental model of the content
expressed by the speaker. We view lexical scales merely as realizations of the underlying
semantic scale over which reasoning takes place.
We nonetheless recognize the importance of the expressions in these associated
scales as vehicles for increasing the salience of alternative portions of the underlying
scale, and it is for this reason that our experiment was explicitly designed to avoid the use
of scale-mates in probing participants’ interpretations of scalars. The design was
motivated by Geurts and Pouscoulous (2009) (see also Pouscoulous 2006; Geurts, 2009;
Doran et al., 2009), who found that experimental designs in which subjects are presented
with the scalar alternatives of a target scalar result in increased reports of strengthened
interpretations. Van Tiel et al. (2014) used such a design and justified it on the grounds
that a general heightening of scalar effects should not affect the relative frequency of
strengthening across scalars. However, it is unknown whether this heightening effect is
uniform across scalars. One contribution of the current paper is to offer a new method for
investigating scalar implicatures without presenting scalar alternatives.
8
Strengthened Readings of Scalars
The current study was designed to probe maximally natural scalar interpretation.
To this end, weak scalars were embedded in the relatively rich context of 3-6 sentence
paragraphs. Participants read the paragraphs and after each one judged the consistency or
inconsistency of each of seven sentences with the paragraph. To minimize the likelihood
that participants would focus on scalar interpretation, four of these sentences were fillers
that did not involve scalars. The remaining sentences probed scalar interpretation in a
novel way, avoiding the use of explicit scalar alternatives. Instead, these probe sentences
contained descriptions of events or states of affairs consistent with different readings of
the scalar sentences from the passage. For example, one passage contained the sentence
She noticed that many of her pencils were chewed on. To test whether many is judged
consistent with all without offering this as an explicit alternative, we asked participants to
judge whether the sentence 100% of her pencils were chewed on was consistent with the
passage. (Note that the phrase “100%” is not itself a scale mate of the term some on
standard views: first, scale mates must be equally lexicalized; some and “100%” are not;
also, they belong to different syntactic categories: some is a determiner, 100% is not.)
Although our method avoids explicit comparisons between scale mates, it may
encourage semantic/pragmatic reasoning about the meaning of the scalar term in its
context. We take this to be a benefit. Our goal is, in part, to investigate whether the
standard mechanistic approaches to scalar inferences – taking the scalar alternatives to be
linguistically given and the process of strengthening to be automatically triggered – are
consistent with ordinary interpretation, which happens in many different contexts. If
strengthening is affected (in ways going beyond granularity) by contextually induced
reasoning, this is important to take into account in our models of the process.
9
Strengthened Readings of Scalars
Experiment
Method
Participants
Forty-three native American-English speaking undergraduates from Carnegie
Mellon University received $8 each for completing the questionnaire.
Design and Stimuli
The experiment had a 9x3 repeated measures within-subjects design. The first
factor was the scalar. We tested 8 scalar words, each one associated with a non-maximal
point on an underlying scale. Target words were: cool, warm, good, like, many, some,
possible, and think. Possible was tested twice: once for a strengthening that excluded the
top of its scale (equivalent to the meaning of possible but not certain) and the other for a
strengthening that excluded a higher but not maximal point on its scale (roughly the
meaning of possible but not probable).
The target scalars were divided into three triads: {cool, good, possible1}, {many,
possible2, think}, {some, like, warm}. For each triad, we constructed twelve naturalistic
paragraphs consisting of 3-6 sentences, and included one instance of each scalar in each
paragraph. Every paragraph was about a different situation or scenario, and the scalars
were used with a variety of different argument types. For example, the verb like appeared
10
Strengthened Readings of Scalars
with direct objects that were activities, food, other people, etc. This variety allowed us to
sample strengthening across a range of naturalistic uses.
The second factor was the relation between the content of the probe sentence and
the semantic scale underlying the target scalar. For each scalar, we devised ways to
invoke a region of the underlying semantic scale clearly consistent with the bounded
(strengthened) reading of the scalar, and parallel ways to invoke a region of the scale
above that bound. Probe sentences were designed to test in two different ways whether
the target scalar was judged compatible with this higher region of the scale.
Unstrengthened-compatible (UNSTR) sentences described the relevant feature of the
passage as being above the expected bound, i.e. invoked a point or region at or close to
the top of the underlying semantic scale. These sentences would be judged consistent
with the passage only if the scalar item were given an unstrengthened interpretation.
Crucially, the unstrengthened-compatible prompts were simple, non-modal statements
(e.g. There was a 100% chance that Sally would run into Steven at the pool.) Sentences
in the Range condition (RNG) made reference to a range of values on the underlying
semantic scale, including points clearly consistent with the strengthened reading and
points lying above the upper bound induced by strengthening. The Range sentences
always included the words anywhere between modifying the given range so as to elicit a
response of “compatible with passage” only if all values on the identified range were
compatible.(e.g. The temperature was anywhere between 32 and 60 degrees Fahrenheit,
or The chance of rain was anywhere between 30% and 100%). Because the Range
sentences included the values from the Unstrengthened-Compatible condition, we
expected that these conditions would pattern together.
11
Strengthened Readings of Scalars
In the third, Strengthened-compatible (STR), condition, sentences described the
relevant feature of the passage as being at a point or small range of values clearly within
the strengthened interpretation of the scalar. These sentences were expected to be judged
consistent with the passage on any reading of the target scalar (as strengthened-
compatible sentences would also be compatible with the unstrengthened reading of the
target item). See Table 1 for an example item from the cool, good, possible/certain triad.
(The complete materials are available at
http://www.cmu.edu/dietrich/philosophy/people/faculty/core-faculty/simons.html.)
** TABLE 1 ABOUT HERE **
Each of the 36 experimental paragraphs was followed by an Unstrengthened-
Compatible sentence for one scalar, a Strengthened-Compatible sentence for a different
scalar, and Range sentence for the remaining scalar, as well as four filler sentences. Three
counter-balancing lists were created such that the assignment of Unstrengthened-
Compatible, Strengthened-Compatible, and Range sentences rotated in a Latin Square
design across the three scalars within each paragraph. The order in which scalars
appeared in a paragraph was varied. Paragraph presentation order was randomized, as
was the ordering of the statements following each passage. Filler sentences ranged in
their consistency with the paragraph, with approximately half of the fillers consistent and
half inconsistent. Some fillers were easy to judge as consistent or inconsistent with the
paragraph, others less so.
12
Strengthened Readings of Scalars
Apparatus
The questionnaire used Qualtrics software and was administered via the web.
Procedure
Instructions read as follows: for each statement following the passage, “decide
whether or not the statement is consistent with what is, for you, the most natural way
of understanding the passage” (bold in original). There followed one example passage,
example statements, and example consistency judgments with explanations. Each
subsequent questionnaire page had a passage followed by seven statements. Participants
judged the consistency of each statement with the passage by pressing either an icon
labeled “consistent with passage”, or one labeled “not consistent with passage.” After
responding to all statements, participants clicked a “continue” button to move to the next
passage. Participants could only advance if every statement had been responded to, and
after moving on, it was not possible to return to an earlier passage. Participants were
encouraged to complete the questionnaire in one sitting; however, after accessing the
questionnaire they were able to save it and complete it later (but not to return to questions
that had already been completed).
13
Strengthened Readings of Scalars
Results
For the purposes of analysis, “consistent with passage” answers were coded as 0
and “inconsistent with passage” answers were coded as 1. Performance on the 96 filler
sentences that the authors judged to be most clearly consistent or inconsistent with the
passages was used to verify that participants understood and attended to the task. Average
performance on these fillers was 80% correct, and the minimum performance was 68%
correct. No participants were clear outliers. Figure 1 reports mean proportion of sentences
in each condition that were judged to be inconsistent with the passage.
**FIGURE 1 ABOUT HERE **
Data were analyzed using both linear mixed effects logit models (Baayen, 2008).
The models were run in R (R Development Core Team, 2013; ver 3.0.1) using the lmer()
function in the lme4 package (Bates, Maechler, & Bolker, 2013; ver. .999999-2). All
models included crossed random intercepts for participants and items; items were defined
as triads of strengthening conditions that queried the same scalar. Following Barr, Levy,
Scheepers, and Tily (2013), we included as much random slope structure as our models
would accept yet still converge. When full models would not converge, we dropped the
random slopes that captured the least variance until the model did converge.
Our first model investigated the effects of strengthening condition. Strengthening
condition was a fixed factor, and random slopes for strengthening condition and scalar
14
Strengthened Readings of Scalars
were included for participants, and random slopes for strengthening condition for items.
Strengthening condition was treatment coded with the Strengthened-Compatible
condition as the reference level, because it should be acceptable under any reading of the
target scalar, and as such provides a baseline to compare to the other conditions.
Participants were reliably more likely to judge the Unstrengthened-Compatible condition
inconsistent than the Strengthened-Compatible condition (estimated ß= 3.28, z=13.22,
p<.001), but there was no difference between the Range and Strengthened-Compatible
conditions (estimated ß=-.12, z=.18, p=.51).
Our second set of models tested two factors that might contribute to variation in
rates of strengthening across scalars. Following van Tiel et al. (2014), the first was the
boundedness of the scale. The second was whether the Unstrengthened-Compatible
condition tested the maximal point of a bounded scale. Note that van Tiel et al. did not
test this factor, as their conceptualization of scalars doesn’t dissociate between the top of
the scale and the stronger scalar alternative. Think/know was left out of these analyses,
because though van Tiel et al. categorized it as bounded, its categorization is not
straightforward. Cool, warm, like, and good were coded as being on unbounded scales
(-.5), and possible/certain, some, many, and possible/probable were coded as being on
bounded scales (.5). For tests investigating the factor of testing the maximal point of a
bounded scale, possible/probable was switched to the -.5 group. Critically, differences in
rates of strengthening across scalars should be evident in the Unstrengthened-compatible
conditions but not the Strengthened-compatible conditions, so this factor was included in
the models with Unstrengthened-compatible conditions coded as .5 and Strengthened-
Compatible conditions coded as -.5. However, the logic of using the Strengthened-
15
Strengthened Readings of Scalars
compatible condition as a baseline depends on that condition being judged consistent
with the passage, and for some items this was rarely the case. We therefore limited
analyses to the 80 items that were judged inconsistent in the Strengthened-compatible
condition on less than 30% of observations1. These models had maximal random effects
structure. A model testing 2x2 fixed factors of boundedness and strengthening condition
showed a reliable interaction (estimated ß=.96, z=.48, p=.048), such that boundedness did
not affect judgments in the Strengthened-compatible condition, but scalars from bounded
scales were more likely to be judged inconsistent in the Unstrengthened-compatible
condition than scalars from unbounded scales. There was also a main effect of
strengthening condition (estimated ß=4.31, z=14.50, p<.001), with Unstrengthened-
compatible conditions more likely to be judged inconsistent than Strengthened-
compatible conditions. An almost-identical model testing the effect of being at the top of
a bounded scale and strengthening condition showed a similar, but stronger, interaction
(estimated ß=1.58, z=2.71, p=.007), and a similar main effect of strengthening condition
(estimated ß=4.63, z=13.84, p<.001).
Seven scalars that had previously been tested in van Tiel et al. (2014) were tested
in the current experiment (assuming that think/know and believe/know are equivalent). To
test the robustness of a scalar’s relative rate of strengthening across the two studies, we
computed Spearman’s rank correlation by ranking these scalars from least to most often
strengthened in each study and computing the correlation between these rankings. The
correlation coefficient was .86.
1 The choice of 30% as a cutoff was fundamentally arbitrary, but it balanced the need to not eliminate too much data, yet keep only items for which the Strengthened-compatible condition was an appropriate baseline.
16
Strengthened Readings of Scalars
Two final analyses compared strengthening rates for individual scalars: possible
when the implicit comparison was probable versus when it was certain, and some versus
many, both of which contrasted with the meaning of all. Both models included maximal
random structure and analysis was again limited to items that were judged inconsistent in
the Strengthened-compatible condition on less than 30% of observations. The first model
tested the interaction of sum-coded fixed effects of possible/probable vs. possible/certain
and Unstrengthened-Compatible vs. Strengthened-Compatible. It revealed a reliable main
effect of strengthening condition, (estimated ß=5.10, z=11.07, p<.001), with the
Unstrengthened-Compatible condition judged inconsistent more often, as well as a
reliable interaction (estimated ß=2.54, z=3.39, p<.001), such that there was a larger effect
of strengthening condition when possible was implicitly compared to certain than to
probable. The second model was identical, but replaced the possible factor with a sum-
coded comparison of some vs. many. This model revealed a main effect of strengthening
condition (estimated ß=4.85, z=10.85, p<.001), but no interaction.
Discussion
The current findings advance our understanding of the factors influencing scalar
diversity and the content of strengthened interpretations. Consistent with findings
reported in Doran et al. (2009) and van Tiel et al. (2014), participants did not uniformly
strengthen across all scalars. Sentences containing the scalars good and think were almost
30% less likely to be strengthened than ones with the scalars many and possible. The
current experiment used richer and more natural stimuli than previous studies, used a
wider variety of contexts, eliminated the need for perspective switching by experiment
participants, included many fillers, and avoided using the critical scalar terms in
17
Strengthened Readings of Scalars
questions, yet the pattern of strengthening across scalars was very similar to the one van
Tiel et al. (2014) found. This suggests that a scalar’s relative rate of strengthening is quite
robust. This suggests also that the passages, although constructed by the experimenters,
did not introduce any unnoticed bias towards strengthened/unstrengthened
interpretations.
The current study also advances our understanding of what factors contribute to
the diversity of scalar strengthening. Consistent with van Tiel et al. (2014), the current
study found that weak scalar terms were more likely to be strengthened if their
underlying scale is bounded than if it is unbounded. However, this effect was primarily
driven by the three scalars on bounded scales for which the Unstrengthened-compatible
condition tested the maximal point on the scale. When possible, which is on a bounded
scale, was tested against Unstrengthened-compatible values that implemented probable,
it patterned with scalars from unbounded scales and differed reliably from the condition
in which possible was contrasted with the meaning of certain. This suggests that the
critical factor may not simply be the boundedness of the underlying scale, but that the
stronger alternative instantiates the critical bound. In van Tiel et al., this distinction could
not be made because there was no underlying scale; the scale was defined by the lexical
scalar alternatives. By definition in their experiment, then, the stronger scalar on a
bounded scale always instantiated a bound. Note that the lack of difference in
strengthening between some and many suggests that it is the critical bound that is
important, not the position of the weak scalar on the scale.
The current findings provide two sources of evidence counter to the most current
theories’ prediction that a strengthened interpretation of a sentence containing a scalar
18
Strengthened Readings of Scalars
term will be inconsistent with any stronger scalar alternative. The first comes from the
two contrasts of possible. In one set of items, the Unstrengthened-Compatible sentences
for possible were consistent with the meaning of probable (a high but non-maximal
region on the probability scale); For example, one passage contained the sentence It was
possible that the store was still open. Participants seeing the Unstrengthened-Compatible
sentence subsequently judged the consistency of:
(8) It was 90-95% possible that the store was still open.
In another set of items, Unstrengthened-Compatible sentences were consistent with
certain (the maximal region on that scale). For example, one passage contained the
sentence It was possible it might rain, and participants seeing the Unstrengthened-
Compatible sentence subsequently judged the consistency of:
(9) The chance of rain was 100%
A reliable interaction between the contrast of possible and strengthening condition
indicates that possible is typically strengthened so as to exclude certain, but less typically
strengthened to exclude probable.2 Note that neither the passages nor the test sentences
contained the word certain or explicitly contrasted probability and certainty, making it
unlikely that this effect was related to a difference in salience between what is probable
versus certain.
2 An anonymous reviewer suggests that in the terminology used by forecasters, a chance of rain greater than 50% counts as probable. All that matters for our purposes is that probabilities in the 90-95% range still count as probable (and not as certain). Note that by using a high range for probable, we increase the likelihood of rejection of this condition.
19
Strengthened Readings of Scalars
The second source of evidence challenging standard assumptions about
strengthening is the finding that Range-condition sentences patterned with Strengthened-
Compatible sentences rather than with Unstrengthened-Compatible sentences. To see the
significance of this result, consider the example of a passage containing the sentence The
baby’s mother had sung him many of his favorite songs. For this passage, the sentences
were:
(10) (UNSTR-Comp) The baby’s mother had sung him 100% of his favorite songs.
(STR-Comp) The baby’s mother had sung him between 70% - 90% of his favorite songs.
(RNG) The baby’s mother had sung him anywhere between 70% -100% of his favorite songs.
The fact that participants tended to reject Unstrengthened-Compatible sentences as
inconsistent but accept Strengthened-Compatible sentences suggests that their
interpretations of scalar terms exclude the higher points on the scale tested in the
Unstrengthened-Compatible conditions. However, if this exclusion were total, then
participants should have rejected Range and Unstrengthened-Compatible sentences at
similar rates. Instead they accepted Range and Strengthened-Compatible sentences at
similar rates. The high acceptability of Range sentences suggests that participants did not
categorically reject the possibility that the value is at the top of the scale, but rather
assigned it a very low probability. Note that this construal of the results assumes that
when participants accept the Range sentence as consistent with the passage, they are
indicating that none of the values within the range, including the maximal value, are
absolutely inconsistent with the passage. The reason we think participants are doing this
20
Strengthened Readings of Scalars
is that the range given in the Range sentences is always accompanied by the words
anywhere between, which emphasizes the non-zero possibility of every point within the
range. It is plausible that participants take some values within the range to be more likely
than others, but as long as they consider the highest point of the range to have a non-zero
probability, the similar patterning of the Strengthened-compatible and Range conditions
is highly informative and not predicted by standard models.
One aspect of the current study that could be considered either a strength or a
weakness is that it grappled with the range of variability that characterizes scalars in
natural language. We tested each scalar across multiple rich contexts, and found
considerable variability. For example, the scalars some, many, and cool had relatively
high rates of rejection for Strengthened-Compatible sentences. This may have been
because we attempted to maintain consistency in the points of the scales tested in the
different strengthening conditions, e.g. we set one range of temperatures to be consistent
with cool, and another to implement cold, and used those across the experiment.
However, there was contextual variability in where the most appropriate value for the
weak scalar fell on the underlying semantic scale, and it sometimes fell outside our set
ranges. Given this variability, one might be concerned that inadvertent differences in
contextual effects on strengthening within items (see Degen, 2015) might have affected
the results. However, the fact that patterns of scalar strengthening were so similar across
the current study and van Tiel et al. (2014) suggests that there are robust patterns of
strengthening for individual scalars regardless of whether the context is minimal or rich
and variable.
21
Strengthened Readings of Scalars
Nonetheless, given the extensive evidence of the influence of fine features of
context on interpretation, we consider one of the most valuable contributions of this paper
to be the innovative method for naturalistic investigation of scalar interpretation.
22
Strengthened Readings of Scalars
Acknowledgements
We would like to thank Eric Kummerfeld for assistance in creating the Qualtrics
questionnaire, Scott Fraundorf for statistical advice, and the University of Pittsburgh
Reading and Language Group and the audience of the 2014 CUNY human sentence
processing conference for helpful comments.
23
Strengthened Readings of Scalars
Tables
Sally went to the pool around 4 o’clock. She enjoyed swimming at the end of the day: she was a good swimmer and she loved how the swim left her feeling cool and refreshed. And although she wouldn’t have admitted it to anyone, she went to the pool in part because it was possible she would run into Steven there.
cool/cold
U: After swimming, Sally would be blue-lipped and shivering.
S: After swimming, Sally would not be hot, but would not be blue-lipped and shivering.
R: After swimming, Sally would be anywhere from comfortably not-hot to blue-lipped and shivering.
good/great
U: On a scale of 1-10, where 1 is the level of an absolute beginner swimmer and 10 is the level of a champion competitive swimmer, Sally ranks a 10.
S: On a scale of 1-10, where 1 is the level of an absolute beginner swimmer and 10 is the level of a champion competitive swimmer, Sally ranks a 7 or 8.
R: On a scale of 1-10, where 1 is the level of an absolute beginner swimmer and 10 is the level of a champion competitive swimmer, Sally could rank anywhere between 7 and 10.
possible/certain
U: There was a 100% chance that Sally would run into Steven at the pool.
S: There was a 30%-70% chance that Sally would run into Steven at the pool.
R: The chances of Sally running into Steven at the pool were anywhere between 30%-100%.
Table 1. Example item
24
Strengthened Readings of Scalars
Figures
Figure 1. Proportion of sentences judged inconsistent for each scalar in each
strengthening condition. Error bars represent standard errors of the mean from
ANOVAs.
25
Strengthened Readings of Scalars
References
Baayen, R.H. 2008: Analyzing linguistic data: A practical introduction to statistics using
R. Cambridge, England: Cambridge University Press.
Barr, D.J., Levy, R., Scheepers, C., & Tily, H.J. 2013: Random effects structure for
confirmatory hypothesis testing: Keep it maximal. Journal of Memory and
Language, 68(3), 255-278.
Bates, D. M., Maechler, M. & Bolker, B. (2013). lme4: Linear mixed effects models using s4
classes. R package version 0.999999.2. http://CRAN.R-project.org/package=lme4.
Bott, L., & Noveck, I. A. 2004: Some utterances are underinformative: The onset and
time course of scalar inferences. Journal of Memory and Language, 51(3), 437-
457.
Carston, R. 1988: Implicature, explicature and truth-theoretic semantics. In R. Kempson
(ed.) Mental Representation: The Interface between Language and Reality.
Cambridge: Cambridge University Press, 155-81.
Chierchia, G. 2004: Scalar Implicatures, Polarity Phenomena, and the Syntax/Pragmatics
Interface. In A. Belleti (ed.), Structures and Beyond. New York: Oxford
University Press.
Chierchia, G., Fox, D., & Spector, B. 2010: The grammatical view of scalar implicatures
and the relationship between semantics and pragmatics. In C., Maienborn, Klaus
von Heusinger and Paul Portner (eds.), Semantics: An International Handbook of
Natural Language Meaning. Berlin: Mouton de Gruyter.
26
Strengthened Readings of Scalars
Degen, J., Tanenhaus, M. K. 2014: Processing scalar implicature: A constraint-based
approach. Cognitive Science.
Doran, R., Baker, R.E., McNabb, Y, Larson, M., & Ward, G. 2009: On the non-unified
nature of scalar implicature: An empirical investigation. International Review of
Pragmatics, 1, 211-248.
Fox, D. 2007: Free Choice Disjunction and the Theory of Scalar Implicatures. In U.
Sauerland & P. Stateva (eds.), Presupposition and Implicature in Compositional
Semantics. Houndsmills, Basingstoke, Hampshire: Palgrave Macmillan.
Gazdar, G.1979: Pragmatics: Implicature, Presupposition, and Logical Form. New
York: Academic Press.
Grodner, D.J. & Russell, B. 2013. Evidence for a rational probabilistic account of
Gricean implicatures. Poster presented at the 26th annual CUNY conference on
human sentence processing. Columbia, SC.
Guerts, B. 2010: Quantity Implicatures. Cambridge: Cambridge University Press.
Geurts, B. & Pouscoulos, N. 2009: Embedded Implicatures?!? Semantics and Pragmatics
2(4), pp.1-24.
Horn, L. R. 1972. On the semantic properties of the logical operators in English. Ph.D.
dissertation, University of California at Los Angeles, Los Angeles, California.
Horn, L. R. A Natural History of Negation. Chicago: University of Chicago Press.
Noveck, I. A. 2000: When children are more logical than adults: Experimental
investigations of scalar implicature. Cognition, 78(2), 165-188.
Papafragou, A., & Musolino, J. 2003: Scalar implicatures: experiments at the semantics–
pragmatics interface. Cognition, 86(3), 253-282.
27
Strengthened Readings of Scalars
R Development Core Team. 2013: R: A language and Environment for statistical
computing. R Foundation for Statistical Computing, Vienna, Austria. URL
http://www.R-project.org.
van Tiel, B., E. van Miltenburg, N. Zevakhina, & B. Guerts 2014: Scalar Diversity.
Journal of Semantics 0, 1-39. doi:10.1093/jos/ffu017
28