probability in logic
TRANSCRIPT
Probability in Logic
Hannes Leitgeb
This chapter is about probabilistic logics: systems of logic in which logical
consequence is defined in probabilistic terms. We will classify such systems and
state some key references, and we will present one class of probabilistic logics in
more detail: those that derive from Ernest Adams’ work.
1 Probability in Logic
Logic and probability have long been studied jointly: Boole (1854) is a classi-
cal example. If ‘logic’ is understood in sufficiently broad terms, then probability
theory might even be subsumed under logic (as a discipline). In the words of Ram-
sey (1926, p.157): “the Theory of Probability is taken as a branch of logic”. John
Maynard Keynes and Edwin Thompson Jaynes held similar views, and variants of
the view were defended more recently, e.g., by Howson (2003) and Haenni (2005).
In that sense, the probabilistic explication of the confirmation of hypotheses (as
initiated by Carnap 1950) may, for example, be regarded as a kind of probabilistic
(or inductive) logic; see the chapter on “Confirmation Theory” in this volume. On
1
the other hand, if used in such a broad manner, the label ‘probabilistic logic’ is no
longer particularly informative as far as its ‘logic’ component is concerned.
In this chapter, we will restrict the term ‘logic’ to logic proper: a logic or
logical system is a triple of the form 〈L, �, `〉, where (i)L is a formal language, (ii)
� is a semantically (model-theoretically) specified relation of logical consequence
defined for the members ofL, (ii) ` is a proof-theoretically (in terms of axioms and
rules) specified relation of deductive consequence for the members of L. Ideally,
` is sound with respect to � (that is, the extension of the ` relation is a subset of
the extension of the � relation), and ` is complete with respect to � (the extension
of the � relation is a subset of the extension of the ` relation). However, not every
logical system will satisfy both of these properties. Logic qua discipline is then
the area in which logics in this sense are defined and in which they are studied
systematically.
Now consider a logic in such a sense of the word: call the formal language
L for which � and ` are defined the ‘object language’, and call the language in
which � and ` are defined the ‘metalanguage’. We can then define probabilistic
logics to be precisely those logics 〈L, �, `〉 for which the definition of � involves
reference to, or quantification over, probability measures (which are then usually
defined for the formulas in L or for subformulas thereof). And the area in which
probabilistic logics in this sense are specified and described is probabilistic logic
as a discipline. Probabilistic logic in this sense is the topic of this chapter. Refer-
ence to, or quantification over, probability measures on the metalevel will thus be
a given in anything that follows.
2
For instance, in section 3 of this chapter we will consider a definition of logical
consequence for object language conditionals that will look like this:
• We say that
{ϕ1 ⇒ ψ1, . . . , ϕn ⇒ ψn} � α⇒ β
iff for all ε > 0 there is a δ > 0, such that for all probability measures P:
if for all ϕi ⇒ ψi it holds that P(ψi|ϕi) > 1 − δ, then P(β|α) > 1 − ε.
All of the technical details concerning this definition will be explained in section
3. What is relevant right now is just that this is a typical example of specifying a
probabilistic logic: for the logical consequence relation � is determined semanti-
cally by quantifying over probability measures (“for all probability measures P”).
As we are going to see, research in probabilistic logic even in this restrictive
sense still cuts through various disciplines: theoretical computer science, artifi-
cial intelligence, cognitive psychology, and philosophy—especially, philosophical
logic, philosophy of science, formal epistemology, and philosophy of language.
However, the focus of this chapter will be on those aspects of probabilistic logic
that seem most relevant philosophically. For instance, the logical consequence re-
lation defined above extends nicely to a logico-philosophical theory of indicative
conditionals in natural language, as developed by Adams (1975).
The rest of this chapter is organized as follows. In section 2 we will turn to
a classification of probabilistic logics along two dimensions: (1.) those which
do not involve reference to, nor quantification over, probability measures on the
object level, and (2.) those which do. And within the second class, we will dis-
3
tinguish between (a.) probabilistic logics which do involve explicit reference to,
or quantification over, probability measures on the object level, and (b.) those for
which this kind of reference or quantification remains implicit. We will regiment
some of the essential references on probabilistic logic into the resulting simple
classification system. When doing so, we will refrain from going into formal de-
tails. And our selection of references will be, obviously, biased and incomplete.
Section 3 will then be devoted to a concrete, and formally more detailed, case
study of probabilistic logic(s): Ernest Adams’ logic of high probability condition-
als and some of its close relatives and variations. We have chosen this example
because it may safely be called the most influential, and probably also the most in-
novative, instance of probability in logic in the philosophical corner of the subject
matter. We will present six types of semantics for high probability conditionals in
that section. Although these semantics will look different initially, and although
they are based on different motivations, they will be seen to determine one and the
same deductive system of axioms and rules for conditionals: Adams’ system P.
Section 3 will be based partially on material from chapters 9-11 in Leitgeb (2004)
(albeit with substantial revisions).
Some final remarks before we start classifying probabilistic logics: First of
all, in probabilistic logic, probability measures are typically defined on formulas
rather than on sets (events) as standard probability theory would have it. We will
state the essentials of such probability measures for formulas in section 3, but only
for the very restrictive case of the language of propositional logic. For the exten-
sion to first-order languages, see Gaifman (1964), Scott and Kraus (1966), Fenstad
4
(1967), Fagin (1976), Gaifman and Snir (1982), Nilsson (1986), Richardson and
Domingos (2006) (for probabilities of first-order formulas as given by Markov
networks), and (as far as inductive logic is concerned) Paris (2011), which may
all be taken to develop probabilistic model theories for first-order formulas. Stan-
dard models or truth evaluations for formulas are thereby replaced by probability
measures, and truth values for formulas are replaced by probabilities.
Secondly, throughout this chapter, the underlying base logic for probability
measures will be assumed to be classical. E.g., one of the axioms for probability
measures on formulas (see section 3) will demand the probability of all classical
tautologies to be 1. But there are also probability measures for which classical
logic is not presupposed in this way: see the chapter on “Probability and Non-
Classical Logic” in this volume for more information.
Thirdly, probability measures can be interpreted in different ways: as an agent’s
rational degrees of belief in propositions, as objective non-epistemic chances of
the occurrence of physical events, as statistical ideal long-term frequencies of
properties applying to individuals in a certain domain, and more. Mostly, prob-
abilistic logics are open to different such interpretations simultaneously, which
is why we will not deal with the topic of interpreting probabilities very much,
even though one of these interpretations is usually put forward as the “intended”
such interpretation (and in most cases that intended interpretation is the subjective
“Bayesian” one in terms of rational degrees of belief).
Fourthly, by turning our attention to logical consequence relations on formal
object languages, we put to one side all theories that combine aspects of logic and
5
probability in a different manner. In particular, there is a substantial literature on
how to combine a logical account of (all-or-nothing) belief or acceptance with a
probabilistic account of numerical degrees of belief, starting with Kyburg (1961),
Hempel (1962), Levi (1967), Hilpinen (1968), and Swain (1970), through the
more recent literature (for overviews see Foley 1993, Maher 1993, Christensen
2004, Huber and Schmidt-Petri 2009) to the most recent of such theories (e.g.,
Hawthorne and Makinson 2007, Sturgeon 2008, Wedgwood 2012, Lin and Kelly
2012, Leitgeb 2014, Buchak, forthcoming, Ross and Schroeder, forthcoming).
Typically, these theories do not use formal languages (in the sense of formal logic)
when stating the logical closure properties of belief, or the probabilistic axioms
for degrees of belief, or principles of how belief relates to degrees of belief. Nor
do they aim to define logical consequence relations for formal languages in prob-
abilistic terms; which is why we will not cover these theories in this chapter.
2 The Classification of Probabilistic Logics
By our definition from the last section, a probabilistic logic 〈L, �, `〉 includes a
logical consequence relation � that is specified on the metalevel by referring to, or
quantifying over, probability measures.
The first main decision point for probabilistic logics concerns the question
of whether or not such a logic also involves reference to, or quantification over,
probability measures on the object level:
1. Probabilistic logics which do not involve reference to, nor quantification
6
over, probability measures on the object level:
These are logical systems in which � is defined in probabilistic terms, but
where the object language L itself (such as, e.g., the language of proposi-
tional logic) is not expressive enough to ascribe probabilities to formulas.
One group of references in this category emerges from Popper (1955) who
axiomatized primitive conditional probability measures for formulas au-
tonomously from logic, that is, without presupposing (meta-)logical con-
cepts such as tautology or logical consequence in the axioms for condi-
tional probability themselves. Such conditional probability measures are
not defined in terms of ratios of unconditional probabilities, as standard
probability theory has it. That is why they can allow for a conditional
probability P(β|α) to be defined even when P(α) = 0 (see Halpern 2001,
Makinson 2011, and chapter “Conditional Probability” in this volume for
an overview). Although logical concepts are not used in their definition,
these measures still end up being based on classical logic due the manner
in which their axioms are set up. Indeed, in turn, it becomes possible now
to define logical concepts, such as the relation of logical consequence, for
the language of propositional logic in purely probabilistic terms. Popper’s
corresponding probabilistic account of logical consequence was extended
later also to first-order languages by Field (1977), Leblanc (1979), van
Fraassen (1981), and Roeper and Leblanc (1999), and to languages with
modalities by Morgan (1982) and Cross (1993). E.g., as far as the lan-
guage L of propositional logic is concerned, Field suggests defining logical
7
consequence probabilistically in the following manner: α1, . . . , αn � β if
and only if for every primitive conditional probability measure P on L that
satisfies Popper’s axioms, and for all formulas γ, it holds that P(β|γ) >
P(α1 ∧ . . . ∧ αn|γ). Leblanc (1983) gives a simpler definition of conse-
quence in terms of unconditional probabilities (which can be defined from
conditional probabilities): α1, . . . , αn � β if and only if for every probability
measure P on L, if for every αi it holds that P(αi) = 1, then also P(β) = 1.
The second group of references in this category has its source in Suppes
(1966) who studied to what extent the probability of the conclusion of a
logically valid argument may fall below the probabilities of the premises
of the argument. It is easy to see that Suppes’ observations can be turned
into a probabilistic definition of � for the language L of propositional logic,
as worked out in detail by Ernest Adams. Adams is also responsible for
extending the account to conditionals α ⇒ β with a new primitive connec-
tive ⇒ that is not definable by means of the connectives of propositional
logic and which one may take to express high conditional probability. We
will turn to Adams’ work on high probability conditionals in more detail in
section 3, but as far as the language L of propositional logic is concerned,
logical implication for L may be defined probabilistically in the Suppes-
Adams style as follows: α1, . . . , αn � β if and only if for every probability
measure P on L it holds that P(β) > 1 − n + P(α1) + . . . + P(αn). This con-
sequence relation can then be shown to coincide extensionally with that of
classical logic. Here is an example of how this result can be applied: since
8
α1, α2 � α1 ∧ α2 in classical logic, it follows from applying the left-to-right
direction of the equivalence above to the case of n = 2 that if P(α1) > 1 − ε
and P(α2) > 1 − ε, then P(α1 ∧ α2) > 1 − 2ε (and one can also show that
this lower bound cannot be improved, unless additional information on the
logical structure of α1 and α2 is available).
Now we turn to systems of probabilistic logic the object languages which are
expressive enough to ascribe probabilities to formulas.
2. Probabilistic logics which do involve reference to, or quantification over,
probability measures on the object level:
The first class of such probabilistic logics concerns object languages that
allow for ascribing probabilities to formulas explicitly:
(a) Probabilistic logics which do involve explicit reference to, or quantifi-
cation over, probability measures on the object level:
By ‘explicit’ we mean: Either a sentential probabilistic operator or a
probabilistic generalized quantifier is applied to a formula, or alter-
natively a probabilistic function symbol is applied to (the name of)
a formula. The result of these applications is then combined some-
how with expressions of the form ‘= r’, ‘> r’, or the like, where
‘r’ is a numeral denoting a real number in the unit interval. This
leads to probabilistic formulas such as: ‘P(α) = r’ (“the probabil-
ity of α is r”) or ‘P(α) > r’ (“the probability of α is greater than or
9
equal to r”) or the like. A probability measure P can be said to sat-
isfy such a formula, if interpreting the symbol ‘P’ by the measure P
yields a true statement. Finally, logical consequence relations are de-
fined for formal languages that include probabilistic formulas of such
types. Usually, this is achieved by defining consequence in terms of
truth preservation in all probability models for the object languages in
question. Roughly: α1, . . . , αn � β if and only if for every probability
measure P on L, if P satisfies each of α1, . . . , αn, then P satisfies β.
And deductive consequence relations may be defined which are then
proven sound and, where possible, complete with respect to logical
consequence. In a nutshell: this part of probabilistic logic deals with
formalizations of the language of probability theory or various natu-
ral fragments thereof, such that the formalization pays off in terms of
an improved control over the relations of logical and deductive conse-
quence. This is in contrast with standard probability theory in which
the entailment relation is usually left merely informal and implicit.
For instance, one way of presenting Hailperin’s (1984, 1996, 2000)
probabilistic logics is for them to involve object languages in which
one can express that the probability of a formula is within a particular
interval of real numbers. However, it is not yet possible to express in
these languages the probabilities of probability statements themselves,
that is, so-called second-order or higher-order probabilities, as for ex-
ample ‘P(P(α) ∈ [r, s]) ∈ [r′, s′]’ (“the probability that the probability
10
that α is in the interval [r, s] is in the interval [r′, s′]”). Gaifman (1986)
is the classical source for the logical treatment of higher-order prob-
abilities, and most of the literature on probabilistic logics that allow
for higher-order probabilities builds on it. (Such higher-order proba-
bility statements can be varied in lots of ways, so that e.g. “outer” and
“inner” occurrences of probabilistic symbols are assigned distinct in-
terpretations; see e.g. Lewis 1980, van Fraassen 1995, Halpern 1991.)
Another landmark paper in that area is Fagin, Halpern, and Meggido
(1990), who essentially logically formalize Nilsson’s (1986) account
of probabilistic reasoning on formulas (which had not been stated with
a formal language as yet). Their object language is even more expres-
sive than what we dealt with before. E.g., in their language one can
say that the weighted sum of probabilities of finitely many formulas
is greater than or equal to a certain real number, as in a1P(α1) + . . . +
anP(αn) > r, where each of the αi may again include logical connec-
tives or P. The resulting language can also be extended to encompass
Boolean combinations of such inequalities, inequalities for conditional
probabilities, and first-order quantification of real numbers (based on
the first-order theory of real closed fields). The authors provide se-
mantic interpretations for these object languages. Relying on findings
from linear programming, they determine sound and complete axiom-
atizations of the corresponding logical consequence relations, and they
state NP-complete decision procedures for the correspondings satisfi-
11
ability problems.
Here are some further closely related probabilistic logics in the same
category: Frisch and Haddawy’s (1988) object language is less expres-
sive than Fagin, Halpern, and Meggido’s, although one can still say
that the probability of a formula is in a certain interval of real numbers,
and the corresponding probability operators can be nested again so that
also higher-order probabilities can be ascribed. On the logical side,
building on Gaifman’s (1986) work, the semantics is set up so that
Miller’s principle—a typical instance of a higher-order so-called prob-
abilistic reflection principle—is logically valid: P(α | P(α) ∈ [r, s]) ∈
[r, s] (“the conditional probability of α, given that the probability of α
is in the interval [r, s], is in the interval [r, s]”).
Heifetz and Mongin (2001) is another theory that employs a language
less expressive than Fagin, Halpern, and Maggido’s but it comes with
a special benefit: a less demanding fragment of arithmetic needs to
be built into the corresponding axiomatic system. Speranski (2013)
extends Fagin, Halpern, and Meggido’s account by adding also quan-
tifiers over propositions.
The object language of Bacchus’ (1990a, 1990b) probabilistic logic
is even more expressive than Fagin, Halpern, and Maggido’s, at least
as far as quantification is concerned, but there are also differences in
terms of interpretation: while Fagin, Halpern, and Maggido’s prob-
ability measures are most easily interpreted as expressing subjective
12
probabilities of closed sentences that determine sets of possible worlds,
Bacchus also considers probability measures which are best interpreted
as expressing statistical probabilities of open formulas that determine
ensembles of individuals in a domain. (See also Hoover 1978 and
Keisler 1985 for probabilistic logics with generalized quantifiers that
concern probabilities of sets of tuples of individuals.) Such subjec-
tive and statistical probability measures can also be combined and ex-
pressed in one and the same logical system, as developed in Halpern
(1990), Bacchus et al. (1996), and chapter 11 of Halpern (2003).
We should also mention some complexity results. We already men-
tioned Bacchus’ (1990a, 1990b) system: his axiomatic system is com-
plete with respect to models that are based on nonstandard probability
measures. (More will be said about nonstandard probability in section
3.) But now consider systems for subjective probability, or statistical
probability, or both combined, where the object language includes: at
least one probabilistic function symbol, the equality symbol, quanti-
fiers (and at least one individual constant symbol). For any system of
such type, Abadi and Halpern (1994) showed that if its set of logical
truths is determined by models that involve only standard probabil-
ity measures, then that set is not recursively axiomatizable anymore
(unless further syntactic restrictions are invoked).
Fagin and Halpern (1994) extend the theory of Fagin, Halpern, and
Meggido (1990) in a different direction, by adding epistemic opera-
13
tors such as for knowledge, and Kooi (2003) and van Benthem, Ger-
brandy, Kooi (2009) further extend the account by invoking dynamic
epistemic or probabilistic operators such as for knowledge change and
probability change.
Finally, originating from a very different background—formal theories
of truth and the study of semantic paradoxes (such as the famous Liar
paradox)—Leitgeb (2012) even allows for type-free probabilities: he
presents different systems of probabilistic logic in which probabilities
are ascribed to formulas that may speak about their own probabilities,
such as a formula α that is provably equivalent to P(α) < 1 (so that
α may be said to express: my probability is less than 1). Christiano
et al. (unpublished) present an alternative theory of type-free proba-
bility, and Caie (2013) gives reasons why one ought to be interested
philosophically in type-free probability in that sense.
(b) Probabilistic logics which only involve implicit reference to, or quan-
tification over, probability measures on the object level:
Whereas the previous category of probabilistic logics concerned ways
of expressing probabilities on the usual numerical scale of concepts,
the probabilistic logics in this category typically involve expressions
for probabilities that merely occupy a categorical (all-or-nothing) or
ordinal (comparative) scale of concepts. In the words of Halpern and
Rabin (1987, 381): “probability theory is not the only way of reason-
ing about likelihood”. The relations of logical consequence for the cor-
14
responding object languages are either defined in terms of truth preser-
vation again—over probability measures themselves or over possible
worlds models that are given a probabilistic interpretation—or they
are not defined in terms of satisfaction at all. We already encountered
Adams’ example of the latter kind in section 1, and we will return to
this in more detail in section 3.
One group of such logical systems concerns formal languages with
an ‘it is (highly) probable that’ operator. With it, one is able to ex-
press that P(α) > r (or maybe instead P(α) > r) for a fixed real num-
ber threshold 12 < r < 1 that is not denoted explicitly in the object
language. Hamblin (1959) was probably the first to study this (but
still without iterations of the probability operator). Burgess (1969)
presents a semantics for such an operator (in the strictly-greater-than
version). He also presents sound, but not complete, axiomatizations of
the corresponding logic even in the case in which nestings of the oper-
ator are allowed. So does Arlo-Costa (2005) who suggests a neighbor-
hood semantics for the operator. And Burgess (1969) gives decision
procedures for the set of logically true (valid) formulas and the set of
satisfiable formulas relative a given threshold 12 6 r < 1.
All of these logics are characterized by ‘it is probable that α’ and ‘it is
probable that β’ failing to entail jointly ‘it is probable that α ∧ β’, in
line with the fact that the probability of a conjunction may fall below
that of its conjuncts (as exemplified nicely in Kyburg’s famous Lottery
15
Paradox—see Wheeler 2007 for an overview). This is clearly in con-
trast with normal systems of modal logic, which are based a possible
worlds semantics rather than a neighborhood semantics. For according
to them, ‘it is necessary that α’ and ‘it is necessary that β’ do jointly
entail ‘it is necessary that α ∧ β’.
Halpern and Rabin (1987) develop yet another, though somewhat dif-
ferent, logical system for such an ‘it is (highly) probable that operator’.
And Terwijn (2005) studies a probabilistic logic the object language
of which is that of first-order logic but where the truth condition of
universally quantified formulas is given by a probabilistic threshold
condition again.
The second group of references in this category concerns logics for
an ‘it is probabilistically certain that’ operator by which one can ex-
press that P(α) = 1. The system of Rescher (1962), in which the box
or “necessity” operator is interpreted in such a probabilistic manner,
is an early example (but see also Hailperin 1937). The modal logic
and semantics that emerge from this interpretation correspond to that
of the standard modal system S5, in which nestings of the new oper-
ator are allowed and where ‘it is probabilistically certain that α’ and
‘it is probabilistically certain that β’ do jointly entail ‘it is probabilis-
tically certain that α ∧ β’. Of course, this is just as it should be, as the
axioms of probability do imply that the probability of a conjunction
is 1 if the probability of each of its conjuncts is. Lando (2010) is a
16
different, and more recent, example of a normal modal logic (in her
case, S4) in which the box operator gets assigned a measure-theoretic
interpretation (though a different one than Rescher’s).
The next two groups of logical systems in the present class are exten-
sions of the first and the second group, respectively, to probabilistic
conditional operators⇒ in the object language, or to binary so-called
nonmonotonic consequence relations |∼ that are expressed metalin-
guistically but which may be viewed as corresponding to sets or theo-
ries of probabilistic object-linguistic conditionals closed under certain
rules. In particular, Hawthorne (1996, 2007), Hawthorne and Makin-
son (2007), and Makinson (2012) study relations |∼ between formulas
in the language of propositional logic, such that α |∼ β if and only if
P(β|α) > r (or P(α) = 0). Here, P is again a given probability measure
P, and r is a given real number threshold, so that 12 < r < 1, and the
threshold is again not denoted explicitly in the object language. Arlo-
Costa and Parikh (2005) also determine nonmonotonic consequence
relations probabilistically but they do so for the probability 1 case,
such that α |∼ β if and only if P(β|α) = 1. However, in their case P
is assumed to be a primitive conditional probability measure as dis-
cussed briefly in the context of Popper’s work in our first category of
17
probabilistic logics from above. While
α |∼ β, α |∼ γ
α |∼ (β ∧ γ)(And)
is logically valid in Arlo-Costa and Parikh’s system, it is invalid in
Hawthorne and Makinson’s system. If Arlo-Costa and Parikh’s logic
for nonmonotonic consequence relations is reconstructed as a logic
for conditionals—so that α ⇒ β expresses in the object language that
α |∼ β (or P(β|α) = 1) holds as expressed in the metalanguage—
then the resulting logical consequence relation � for such conditionals
is monotonic again, and it can be axiomatized in a sound and com-
plete manner in terms of (Adams’) logical system P in section 3 be-
low. (For this to be the case it is crucial that Arlo-Costa and Parikh
assume ‘P’ to refer to a primitive conditional probability measure that
satisfies Popper’s axioms.) And if the logic of Hawthorne and Makin-
son is reconstructed as a logic of conditionals in a similar manner,
then, metaphorically speaking, its axiomatization can be seen as the
result of “subtracting” the And rule above from the system P in sec-
tion 3. However, it turns out to be quite difficult to state a sound and
complete axiomatization for the logical consequence relation that is
wanted. That relation � is given semantically by:
• {ϕ1 ⇒ ψ1, . . . , ϕn ⇒ ψn} � α ⇒ β iff for all P, for all r ∈ [0, 1]:
if for all ϕi ⇒ ψi it holds that P(ψi|ϕi) > r (or P(ϕi) = 0), then
18
P(β|α) > r (or P(ϕ) = 0).
Hence, logical consequence corresponds to probability preservation
above a threshold. Hawthorne and Makinson (2007) conjectured that
their deductive system O was sound and complete (for Horn rules with
finitely many premises). However, Paris and Simmonds (2009) proved
it to be incomplete, while the infinite system of rules that Paris and
Simmonds ultimately did prove to be sound and complete is highly
complicated and not very intuitive.
Adams’ logic of high probability conditionals to which we will turn in
more detail in the next section lies somewhere in between Hawthorne
and Makinson’s and Arlo-Costa and Parikh’s accounts: the And rule
that was mentioned above is logically valid in Adams’ system (if stated
for conditionals), while Adams’ intended interpretation of α ⇒ β is
that the conditional probability of β given α is high, but not necessarily
equal to 1. The difference to Hawthorne and Makinson’s interpretation
consists in the corresponding threshold that defines the term ‘high’ to
remain only vaguely determined: no fixed real number threshold is
intended to be “the” correct one.
Finally, there are probabilistic logics which also belong to the present
category but which represent probability measures in the object lan-
guage differently from the logics discussed so far. E.g., Segerberg
(1971) and Gardenfors (1975) study logical systems with an ‘is at
least as probable as’ operator by which the laws of so-called quali-
19
tative probability (which originated with Bruno de Finetti) can be ex-
pressed in logical terms. Baltag and Smets (2008) develop logics for
dynamic operators that represent probabilistic update in the object lan-
guage. Yalcin (2010) presents a nice survey on probabilistic operators
of various kinds from a philosophical and linguistic point of view. Yal-
cin’s paper also includes further relevant references to logical studies
of probabilistic operators on a categorical or ordinal scale.
3 A Case Study: Probabilistic Logics for High Prob-
ability Conditionals
In this final section we will study six distinct but extensionally equivalent se-
mantics for “high probability” conditionals, which all derive from Ernest Adams’
work. Afterwards, we will turn to their axiomatic treatment.
3.1 Semantics for High Probability Conditionals
We need some preliminary definitions, before we can state the different versions
of probability semantics for high probability conditionals.
First of all, throughout the following two subsections, let L be the formal
language of standard propositional logic, except thatL is restricted to only finitely
many propositional variables p1, . . . , pn. As far as logical symbols are concerned,
we assume the standard connectives of classical propositional logic to be included
20
in the vocabulary of L: ¬, ∧, ∨,→ (for the material conditional),↔ (for material
equivalence). SoL contains formulas α such as ¬p1, p2 → (p3∨ p4), ¬(p5∧¬p5),
and the like (assuming that n > 5), as usual.
Secondly, let W be the (finite) set of all classical truth value assignments to
p1, . . . , pn. More briefly, we shall speak of W as the set of all logically possible
worlds over L, since each single member of W determines uniquely a logically
possible model or way of assigning truth values to the formulas in L in line with
the usual semantic rules. If the model determined by w in W satisfies α, then
we will denote this by: w � α. In the terminology of probability theory, W is
going to function as the sample space of our probability measures; accordingly,
the members of W may also be regarded as the possible outcomes of a random
experiment.
Thirdly, with that set W being in place, call 〈W, ℘(W), P〉 a probability space
(over W) if and only if (i) ℘(W) is the power set over W (the set of all subsets
of W), and (ii) P is a probability measure on ℘(W), that is, P : ℘(W) → [0, 1],
P(W) = 1, P(∅) = 0, and the axioms of finite additivity holds: for all X,Y ⊆ W,
such that X∩Y = ∅, it is the case that P(X∪Y) = P(X) + P(Y). (The axiom of so-
called countable additivity or σ-additivity will not be assumed and will not play a
role in any of the following.) Conditional probabilities can then be introduced by
means of P(Y |X) =P(X∩Y)
P(X) in case P(X) > 0. In one of the semantic systems below
we will actually deviate from this definition by allowing also for non-standard
real numbers in the unit interval to be assigned by P; but in all of the other seman-
tic systems we will stick to the definition just presented. The members of ℘(W)
21
will be called ‘propositions’—W is the largest or “tautological” proposition, ∅
is the least or “contradictory” proposition—and thus probability measures in this
sense assign probabilities to propositions and not (yet) to formulas. In standard
probability theory, the members of ℘(W) would rather be called ‘events’, but the
difference is irrelevant really. (More importantly, standard probability theory al-
lows for certain subsets of the sample space W not to be assigned probabilities at
all; this will not be important either in what follows.)
Fourthly, although each P assigns probabilities to propositions, it may be used
also, indirectly, to assign probabilities to formulas in L (and we will use the same
function symbol ‘P’ for that purpose): for α inL, let [α] = {w in W |w � α}. [α] is
the set of worlds in which α is true, and we regard it as the proposition expressed
by α. And for each α we can then define: P(α) = P([α]). Accordingly, for α, β
in L, define P(β|α) =P(α∧β)
P(α) in case P(α) > 0. In order to simplify matters a bit
later on, we will also regard P(β|α) to be well-defined, and indeed equal to 1, if
P(α) = 0. P(β|α) is the conditional probability that will be associated later with
the high probability conditional α⇒ β.
Fifth, following Ernest Adams’ lead, we define the so-called uncertainty of β
given α by means of Unc(β|α) = 1 − P(β|α). Unc(β|α) will be the uncertainty
associated with the high probability conditional α⇒ β.
Finally, let our conditional language L⇒ be the set of conditionals of the form
α ⇒ β for which the antecedent α and the consequent β are formulas in L. ⇒
is a conditional connective that is not included in the vocabulary of L, in partic-
ular, it is meant to differ from the symbol → for the material conditional. The
22
intended interpretation of the new conditional α ⇒ β will be ‘if α, then it is
highly likely that β’ or ‘the conditional probability of β given α is high’. We
want to leave open whether asserting such a conditional is meant to express the
proposition that the corresponding conditional probability is high or whether as-
serting it merely expresses the corresponding high conditional probability in a
more direct, non-propositional, “expressivist” manner (for the difference between
the two interpretations, see section 44 of Bennett 2003). Either way, we will
call these conditionals ‘high probability conditionals’, so that “their” probabilities
are given in terms of their corresponding conditional probabilities, and it is these
conditional probabilities that are taken to be high. As Lewis’ (1976) showed in
terms of his famous triviality theorems, and as subsequent work on the same topic
made even clearer (such as by Hajek 1989), this ‘probabilities of conditionals are
conditional probabilities’ claim ought not to be understood in the way that the un-
conditional probability of the proposition expressed by α ⇒ β would be required
to equal the conditional probability of β given α. This is because this would entail
the underlying probability measure to be trivial as far as its range of possible nu-
merical values is concerned, given only some very mild background assumptions.
Instead, if one wants to speak of probabilities of conditionals at all, one should
think of their probabilities as being defined as conditional probabilities without
any assumption to the effect that probabilities of conditionals would also have to
satisfy the axioms of unconditional probability. Also note that, syntactically, the
members of our conditional language L⇒ are “flat” in neither allowing for nest-
ings of conditionals nor for the application of any of the connectives of classical
23
propositional logic to conditionals. For instance, L⇒ does not include negations
of conditionals.
When we are going to study logical consequence relations for this conditional
language L⇒, we will focus on finite sets KB⇒ ⊆ L⇒ of such conditionals, which
will then function as finite sets of conditional premises or as finite (probabilistic)
conditional knowledge bases (as theoretical computer scientists would say). We
use the notation ‘KB⇒’, with the subindex ‘⇒’, in order to signal that any such
KB⇒ is a set of conditionals. Although we do not include any “factual”, that is,
non-conditional, formulas in L⇒ nor in any KB⇒ ⊆ L⇒, for many applications
one may think of conditionals > ⇒ α with the tautological antecedent > as being
logically equivalent to the factual formula α in L. In particular, this makes good
sense if one thinks of ⇒ as representing the indicative ‘if-then’ in natural lan-
guage, and, accordingly, Adams does treat α and > ⇒ α as logically equivalent.
We are now ready to present six probabilistic semantics for high probability
conditionals. Each semantics—except for the infinitesimal semantics—is based
essentially on some probability semantics that had been suggested by Adams (see,
e.g., Adams 1966, 1974, 1975, 1986, 1996, 1998, and Adams and Levine 1975).
Adams’ semantic systems were further refined and extended by Pearl (1988),
McGee (1989), Lehmann and Magidor (1992), Edgington (1995), Goldszmidt
and Pearl (1996), Schurz (1997, 1998), Snow (1999), Bamber (2000), Biazzo et
al. (2002), Halpern (2003), Arlo-Costa and Parikh (2005), and Leitgeb (2012a,
b).
Each of the semantic systems below includes the definition of a logical en-
24
tailment relation that holds between finite sets of high probability conditionals
and further such conditionals. Each of these definitions will seem to be more or
less plausible in itself, but they will all be based on different philosophical ideas
and motivations: While semantics 2, 4, 6 are defined in terms of truth preserva-
tion, semantics 1, 3, and 5 do not involve the notion of truth of a high probability
conditional in a model at all. Whereas semantics 1 and 2 understand logical con-
sequence dynamically in terms of ‘the more likely the premises get, the more
likely the conclusion gets’, all the other semantics are static. Where semantics 3
and 5 concern the reliability of reasoning with conditionals, as they demand the
probability of a conclusion not to drop too much below the probabilities of the
premises, semantics 4 and 6 take a more idealized viewpoint by considering prob-
abilistic orderings of worlds or infinitesimal probabilities. But, surprisingly, all of
these definitions can be shown ultimately to determine (extensionally) one and the
same relation of logical consequence for high probability conditionals, as we are
going to see later. The resulting sound and complete deductive system of logical
axioms and rules is Adams’ logic P of conditionals, which therefore turns out to
be robustly justified on quite diverse semantic grounds.
According to the first semantic system that we introduce, a set of high proba-
bility conditionals entails another high probability conditional if and only if: the
higher the probabilities of the conditionals contained in the premise set, the higher
also the probability of the conditional conclusion. This leads to a kind of “con-
tinuity” semantics for high probability conditionals which, accordingly, employs
an ε-δ-criterion:
25
Definition 1 (Continuity Semantics for High Probability Conditionals)
• We say that
KB⇒ �cont α⇒ β
iff for all ε > 0 there is a δ > 0, such that for all probability measures P:
if for all ϕ⇒ ψ in KB⇒ it holds that P(ψ|ϕ) > 1 − δ, then P(β|α) > 1 − ε
(that is: if P(ψ|ϕ) is “high” for all ϕ⇒ ψ in KB⇒, also P(β|α) is “high”).
It is well known that the definition of continuous functions over the reals can
either be stated in terms of an ε-δ-criterion or in terms of the preservation of
limits along sequences of real numbers. Similarly, also the continuity semantics
above allows for a restatement in terms of a sequence semantics, where a sequence
of probability measures is defined to satisfy a high probability conditional if the
conditional probability associated with the conditional is identical to 1 “in the
limit” of the sequence. Adams (1986, p.277) hints at such a type of semantics in a
footnote. Variants of such a sequence semantics—but defined on more expressive
languages than our simple L—are employed by Halpern (2003) in his system
of inductive reasoning for statistical and subjective probabilities, and by Leitgeb
(2012a, b) in his probability logic for counterfactuals:
Definition 2 (Sequence Semantics for High Probability Conditionals)
• A probabilistic sequence model Mseq for high probability conditionals is a
sequence (Pn)n∈N of probability measures.
26
• Relative to a probabilistic sequence modelMseq = (Pn)n∈N we can define:
Mseq �seq α⇒ β
iff the real sequence (Pn(β|α))n∈N converges, and
limn→∞
Pn(β|α) = 1
(that is: Pn(β|α) “tends” towards 1 for increasing n).
• Mseq �seq KB⇒ iff for every α⇒ β in KB⇒ it holds thatMseq �seq α⇒ β.
• We say that
KB⇒ �seq α⇒ β
(KB⇒ sequence-entails α⇒ β) iff
for every probabilistic sequence modelMseq:
ifMseq �seq KB⇒, thenMseq �seq α⇒ β.
Next, we turn to a semantics for high probability conditionals that does not
involve anything like probabilities getting “arbitrarily close to 1”. A set of high
probability conditionals will instead be said to entail a high probability condi-
tional if the uncertainty associated with the latter is smaller than or equal to the
sum of the uncertainties of the conditionals contained in the premise set; that is,
if the uncertainty of the conditional to be entailed is bounded additively by the
27
uncertainties that are associated with the premise conditionals. In contrast with
the two semantic systems from before, if a set of high probability conditionals en-
tails another such conditional in this sense, there is always a lower bound for the
probability that is associated with the conclusion, such that this lower bound can
additionally be computed easily. As Schurz (1997) points out, the resulting entail-
ment relation approximates the so-called “quasi-tightness” property of inferences
that was defined in Frisch and Haddawy (1994). This kind of uncertainty seman-
tics, which had been introduced by Adams again, was taken up and defended for
example by Edgington in her theories of indicative conditionals (Edgington 1995)
and vague terms (Edgington 1996); similarly, Field (2009) models his account of
how logical implication interacts normatively with degrees of belief after this kind
of (Suppes-)Adams-style uncertainty semantics:
Definition 3 (Uncertainty Semantics for High Probability Conditionals)
• We say that
KB⇒ �unc α⇒ β
(KB⇒ uncertainty-entails α⇒ β) iff
for every probability measure P (and where a sum over an empty set of
indices is defined to be 0):
P(β |α ) > 1 −∑
ϕ⇒ψ∈KB⇒
Unc(ψ |ϕ ),
28
that is,
Unc(β |α ) 6∑
ϕ⇒ψ∈KB⇒
Unc(ψ |ϕ )
(in words: P(β |α ) is “high” if the uncertainties Unc(ψ |ϕ ) are very “low”
for all ϕ ⇒ ψ ∈ KB⇒; or: for all probability measures, it holds that the
uncertainty associated with α⇒ β is bounded from above by the sum of the
uncertainties associated with the premises).
According to the next semantics, a high probability conditional is satisfied by
a certain kind of probability measure that ranks worlds by polynomial “orders of
magnitude”. A high probability conditional is satisfied by such a probability mea-
sure if its associated conditional probability is of the maximal order of magnitude
(compare Snow’s 1999 “atomic bound probabilities” and Benferhat et al. 1997
on their so-called “big-stepped probabilities”). The order-of-magnitude mapping
may also be seen as a selection function in the sense of Stalnaker (1968) or as de-
termining a special kind of sphere system of worlds in the sense of Lewis (1973).
This explains the formal correspondence between the logic of high probability
conditionals in the next subsection and Stalnaker’s and Lewis’ logical systems for
counterfactuals. But the intended interpretation of Stalnaker’s and Lewis’ order-
ings in terms of similarity or closeness to the actual world differs from the purely
probabilistic ordering of worlds below. Probabilistic order-of-magnitude models
are also close to ranked models along the lines of Kraus, Lehrmann and Magidor
(1990) and Lehmann and Magidor (1992)—which explains the formal correspon-
dence between the logic in the next subsection with systems well-known from
29
nonmonotonic reasoning—and to ranking functions (or ordinal conditional func-
tions) in the sense of Spohn (1988, 2012). This is what this order of magnitude
semantics looks like in more formal terms:
Definition 4 (Order of Magnitude Semantics for High Probability Conditionals)
• A probabilistic order-of-magnitude model Mom for high probability condi-
tionals is a bijective mapping om : W → {0, . . . , n − 1}.
(So om is both one-to-one and onto: om(w) is the “probabilistic order of
magnitude” of w. The cardinality of W, card(W), is n.)
• Relative to a probabilistic order-of-magnitude modelMom(= om), and rela-
tive to some “small” real number v ∈ [0, 1] (say, v < 12 ), we can define:
– Let Pom be the unique probability measure that satisfies:
Pom({w}) = vom(w)(1 − v) for om(w) < card(W) − 2,
Pom({w}) = vcard(W)−1 for om(w) = card(W) − 2,
Pom({w}) = 0 for om(w) = card(W) − 1.
–
Mom �om α⇒ β
iff Pom(β |α ) > 1 − v
(that is: Pom(β |α ) is “high” or corresponds to the highest order v0(1−
v) = 1 − v of magnitude).
30
Note that whether Mom �om α ⇒ β or not is actually independent of the
exact choice of v.
• Mom �om KB⇒ iff for every α⇒ β ∈ KB⇒ it holds thatMom �om α⇒ β.
• We say that
KB⇒ �om α⇒ β
(KB⇒ order–of-magnitude-entails α⇒ β) iff
for every probabilistic order-of-magnitude modelMom:
ifMom �om KB⇒, thenMom �om α⇒ β.
The next semantics defines a set of high probability conditionals to entail a
high probability conditional, if, whenever the conditional probabilities that are
associated with the premises are “close” to 1 (where the referent of ‘close’ is
determined relative to the number of premises), the conditional probability that is
associated with the conclusion, say, α ⇒ β, is greater than 12 and hence greater
than the conditional probability that is associated with α ⇒ ¬β. Since in any
such case the set of β-worlds constitutes the “majority” within the set of α-worlds
(as measured by the probability measure in question), we call this the ‘majority
semantics’. Logical consequence given by this semantics therefore consists in the
premises making the conclusion more likely than not:
Definition 5 (Majority Semantics for High Probability Conditionals)
31
• Let KB⇒ = {ϕ1 ⇒ ψ1, . . . , ϕn ⇒ ψn}:
We say that
KB⇒ �ma j α⇒ β
(KB⇒ majority-entails α⇒ β) iff
for all probability measures P:
if P(ψ1|ϕ1) > 1 − 12n , . . . , P(ψn|ϕn) > 1 − 1
2n , then P(β|α) > 12
(that is, if the premise probabilities are “high”, then P(β|α) is greater than
P(¬β|α)).
The final semantics for high probability conditionals that we will discuss was
suggested by Lehmann and Magidor (1992), pp.48–53, and it is special in so far
as it presupposes the nonstandard analysis of real numbers. Nonstandard analysis
adds infinitely small numbers (the so-called ‘infinitesimals’) and infinitely large
numbers to the standard set of real numbers. Apart from the introduction to non-
standard analysis that is contained in Lehmann and Magidor (1992) itself, brief but
useful accounts of nonstandard analysis can also be found in section 4.4 of Chang
and Keisler (1990), and in more informal terms, in Adams (1998), pp.253–256.
Definition 6 (Infinitesimal Semantics for High Probability Conditionals)
• An infinitesimal probabilistic model Min f for high probability conditionals
is a nonstandard probability measure P : ℘(W)→ [0, 1]∗, that is, probabili-
ties are nonstandard reals non-strictly between 0 and 1, such that P(W) = 1,
P(∅) = 0, and finite additivity is satisfied.
32
• Relative to an infinitesimal probabilistic modelMin f (= P) we can define:
Min f �in f α⇒ β
iff 1 − P(β |α ) is infinitesimal, that is,
for all standard reals ε ∈ R with ε > 0: 1 − P(β |α ) < ε
(that is: P(β |α ) is either identical to 1 or “infinitely close” to 1).
• Min f �in f KB⇒ iff for every α⇒ β ∈ KB⇒ it holds thatMin f �in f α⇒ β.
• We say that
KB⇒ �in f α⇒ β
(KB⇒ infinitesimally entails α⇒ β) iff
for every infinitesimal probabilistic modelMin f :
ifMin f �in f KB⇒, thenMin f �in f α⇒ β.
This concludes our series of semantical systems for high probability condi-
tionals.
We are now ready to turn to a comparison between these different versions of a
high probability semantics in terms of their respective logical consequences rela-
tions. Surprisingly, the semantic systems that we presented in this subsection turn
out to be extensionally mutually equivalent in the following sense (for the proof
of this theorem, and for information on which proofs in the relevant literature the
theorem is based, see Leitgeb 2004, pp.177f):
33
Theorem 7 (Equivalence of the Different Versions of Probability Semantics with
respect to Entailment)
Let KB⇒ ⊆ L⇒, α⇒ β ∈ L⇒; the following claims are equivalent:
1. KB⇒ �cont α⇒ β
2. KB⇒ �seq α⇒ β
3. KB⇒ �unc α⇒ β
4. KB⇒ �om α⇒ β
5. KB⇒ �ma j α⇒ β
6. KB⇒ �in f α⇒ β.
In the next subsection, we will determine the very consequence relation that
corresponds to these semantic systems in proof-theoretic terms.
3.2 Proof Theory for High Probability Conditionals
Consider the following rules of inference for conditionals in L⇒ (where ‘`’ de-
notes the derivability relation of classical propositional logic):
•
α⇒ α(Reflexivity)
•α ` β, β ` α, α⇒ γ
β⇒ γ(Left Equivalence)
34
•γ ⇒ α, α ` β
γ ⇒ β(Right Weakening)
•(α ∧ β)⇒ γ, α⇒ β
α⇒ γ(Cautious Cut)
•α⇒ β, α⇒ γ
(α ∧ β)⇒ γ(Cautious Monotonicity)
Note that Reflexivity is premise-free (so it is really an axiom scheme).
Kraus, Lehmann and Magidor (1990), section 3, refer to the system of rules
above as the system C of cumulative reasoning (although they spell things out
in terms of nonmonotonic consequence relations rather than in terms of condi-
tionals). Cumulativity, that is, Cautious Cut and Cautious Monotonicity taken
together, has been suggested by Gabbay (1984) to be a valid closure property of
plausible reasoning: Cautious Monotonicity expresses that importing consequents
(such as β) into an antecedent (so that α is turned into α∧β) does not subtract from
the original antecedent’s (α’s) inferential power. In turn, Cautious Cut expresses
that importing consequents in this way does not add to the antecedent’s inferential
power either: for consider the denial of the conclusion. Then at least one of the
two premises does not hold. If α ⇒ β does hold, so that β is a consequence of α,
then we cannot infer γ by importation of that consequence.
Furthermore, we also consider the following rule:
•α⇒ γ, β⇒ γ
α ∨ β⇒ γ(Disjunction)
35
The system that results from adding the Disjunction rule to system C is called
the system P of preferential reasoning by Kraus, Lehmann and Magidor (1990),
section 5. This stronger system P is one of the standard systems of nonmonotonic
logic, and it turns out to be sound and complete with respect to many different se-
mantics of nonmonotonic logic (some of them are collected in Gabbay et al. 1994;
see also Gardenfors and Makinson 1994, Chapter 4.3 of Fuhrmann 1997, Benfer-
hat et al. 1997, and Benferhat et al. 2000). Psychological findings, though still on
a very preliminary level, indicate that P incorporates some of the rationality pos-
tulates governing human commonsense reasoning with conditionals (see Pfeifer
and Kleiter 2005, 2010). P also coincides with the “flat” fragment of Stalnaker’s
and Lewis’ logic(s) for counterfactuals.
The derivability of conditionals α ⇒ β from a finite set KB⇒ of conditionals
by means of the rules above—resulting in the deductive consequence relations `C
and `P, respectively—is defined just as usual, that is, analogously to the definition
of derivability of formulas from formulas in classical propositional logic.
The following rules can be shown to be (meta-)derivable from the systems
introduced above:
Lemma 8 (Kraus, Lehmann and Magidor 1990, pp.179–180)
The following rules are derivable in C (that is: from Reflexivity+Left Equiva-
lence+Right Weakening+Cautious Cut+Cautious Monotonicity):
1.α⇒ β, α⇒ γ
α⇒ (β ∧ γ)(And)
36
2.α⇒ β, β⇒ α, α⇒ γ
β⇒ γ(Equivalence)
3.α⇒ (β→ γ) , α⇒ β
α⇒ γ(Modus Ponens in the Consequent)
4.α ` β
α⇒ β(Supra-Classicality)
Lemma 9 (Kraus, Lehmann and Magidor 1990, p.191)
The following rules are derivable in P (that is: from Reflexivity+Left Equiva-
lence+Right Weakening+Cautious Cut+Cautious Monotonicity+Disjunction; we
label the derivable rules in the same way as Kraus, Lehmann and Magidor 1990):
1.α ∧ β⇒ γ
α⇒ (β→ γ)(S)
2.α ∧ β⇒ γ, α ∧ ¬β⇒ γ
α⇒ γ(D)
Finally, we can relate the semantic systems of the previous subsection to the
system of rules specified above by means of a soundness and completeness theo-
rem (see Leitgeb 2004, chapter 10, for the proof, and for the proofs in the relevant
parts of the literature on which the theorem is based):
Theorem 10 (Soundness and Completeness of P)
37
Let KB⇒ ⊆ L⇒, α ⇒ β ∈ L⇒; then each of the claims in theorem 7 is
equivalent to:
KB⇒ `P α⇒ β
That is: the system P is sound and complete with respect to the continuity seman-
tics, the sequence semantics, the uncertainty semantics, the order of magnitude
semantics, the majority semantics, and the infinitesimal semantics for high prob-
ability conditionals.
In contrast, none of the following rules are (meta-)derivable in P nor are they
valid with respect to any of the semantics of the last subsection, even though their
counterparts for material conditionals are of course valid:
•α⇒ β
¬β⇒ ¬α(Contraposition)
•α⇒ β, β⇒ γ
α⇒ γ(Transitivity)
•α⇒ γ
α ∧ β⇒ γ(Monotonicity; Strengthening of the Antecedent)
As Bennett (2003) argues in his chapter 9 (and as had been argued before by,
e.g., Adams 1975 and Edgington 1995), none of these rules of inference is par-
ticularly plausible for the indicative if-then in natural language. Accordingly, in
nonmonotonic reasoning all of these rules are normally given up as applying to
default conditionals or (if reformulated accordingly) nonmonotonic consequence
38
relations. However, we have already seen weakenings of these three rules to be
contained in system P: in particular, Cautious Cut may be regarded as a weak-
ening of Transitivity, and Cautious Monotonicity is clearly a weakened version
of Monotonicity. (See Johnson and Parikh 2008 for an argument that, in a sense
explained in their paper, the monotonicity rule is nevertheless “almost valid” for
probabilistic conditionals.)
The exchange of ideas between logic and probability theory, and the system-
atic study of jointly logical and probabilistic systems, has had a favourable effect
on both areas in the past. It may have an even more favourable effect on the two
areas in the future.
Acknowledgements: We are grateful to Stanislav Speranski, Alan Hajek, John
Cusbert, and Edward Elliott for comments on a previous draft of this chapter.
Work on this paper was supported generously by the Alexander von Humboldt
Foundation.
References
[1] Abadi, M. and Halpern, J.Y., 1994: “Decidability and Expressiveness for
First-Order Logics of Probability”, Information and Computation 112, 1–
36.
[2] Adams, E.W., 1966: “Probability and the Logic of Conditionals”, in: Hin-
tikka and Suppes (1966), 265–316.
39
[3] Adams, E.W., 1974: “The Logic of ‘Almost All”’, Journal of Philosophical
Logic 3, 3–17.
[4] Adams, E.W., 1975: The Logic of Conditionals, Dordrecht: D. Reidel.
[5] Adams, E.W., 1986: “On the Logic of High Probability”, Journal of Philo-
sophical Logic 15, 255–279.
[6] Adams, E.W., 1996: “Four Probability-Preserving Properties of Infer-
ences”, Journal of Philosophical Logic 25, 1–24.
[7] Adams, E.W., 1998: A Primer of Probability Logic, Stanford: CSLI Lec-
ture Notes.
[8] Adams, E.W. and Levine, H.P., 1975: “On the Uncertainties Transmit-
ted from Premisses to Conclusions in Deductive Inferences”, Synthese 30,
429–460.
[9] Arlo-Costa, H., 2005: “Non-Adjunctive Inference and Classical Modali-
ties”, Journal of Philosophical Logic 34, 581–605.
[10] Arlo-Costa, H. and Parikh, R., 2005: “Conditional Probability and Defea-
sible Inference”, Journal of Philosophical Logic 34, 97–119.
[11] Bacchus, F., 1990a: “On Probability Distributions Over Possible Worlds”,
in: Proceedings of the Fourth Annual Conference on Uncertainty in Artifi-
cial Intelligence, UAI’1988, Amsterdam: North-Holland, 217–226.
40
[12] Bacchus, F., 1990b: Representing and Reasoning with Probabilistic Knowl-
edge, Cambridge: The MIT Press.
[13] Bacchus, F., Grove, A.J., Halpern, J.Y., and Koller, D., 1996: “From Sta-
tistical Knowledge Bases to Degrees of Belief”, Artificial Intelligence 87,
75–143.
[14] Baltag, A. and Smets, S., 2008: “Probabilistic Dynamic Belief Revision”,
Synthese 165, 179–202.
[15] Bamber, D., 2000: “Entailment with Near Surety of Scaled Assertions of
High Conditional Probability”, Journal of Philosophical Logic 29, 1–74.
[16] Benferhat, S., Dubois, D., and Prade, H., 1997: “Possibilistic and Standard
Probabilistic Semantics of Conditional Knowledge”, Journal of Logic and
Computation 9, 873–895.
[17] Benferhat, S., Saffiotti, A., and Smets, P., 2000: “Belief Functions and
Default Reasoning,” Artificial Intelligence 122, 1–69.
[18] Bennett, J., 2003: A Philosophical Guide to Conditionals, Oxford: Claren-
don Press.
[19] Van Benthem, J., Gerbrandy, J., and Kooi, B., 2009: “Dynamic Update
with Probabilities”, Studia Logica 93, 67–96.
[20] Biazzo, V., Gilio, A., Lukasiewicz, T., and Sanfilippo, G., 2002: “Proba-
bilistic Logic under Coherence, Model-Theoretic Probabilistic Logic, and
41
Default Reasoning in System P”, Journal of Applied Non-Classical Logics
12, 189–213.
[21] Brewka, G. (ed.), 1996: Principles of Knowledge Representation, Stanford:
CSLI Publications and FoLLI.
[22] Brewka, G., Dix, J., Konolige, K., 1997: Nonmonotonic Reasoning. An
Overview, Stanford: CSLI Lecture Notes 73.
[23] Boole, G., 1854: An Investigation of The Laws of Thought on Which are
Founded the Mathematical Theories of Logic and Probabilities, London:
Macmillan.
[24] Buchak, L., forthcoming: “Belief, Credence, and Norms”, Philosophical
Studies.
[25] Burgess, J.P., 1969: “Probability Logic”, The Journal of Symbolic Logic
34, 264–274.
[26] Caie, M., 2013: “Rational Probabilistic Incoherence”, Philosophical Re-
view 122, 527–575.
[27] Carnap, R., 1950: Logical Foundations of Probability, Chicago: University
of Chicago Press.
[28] Chang, C.C., Keisler, H.J., 1990: Model Theory, Amsterdam: North-
Holland.
[29] Christensen, D., 2004: Putting Logic in Its Place, Oxford: Clarendon Press.
42
[30] Christiano, P., Yudkowsky, E., Herreshoff, M., and Barasz, M., unpub-
lished: “Definability of Truth in Probabilistic Logic”, unpublished draft.
[31] Cross, C.B., 1993: “From Worlds to Probabilities: A Probabilistic Seman-
tics for Modal Logic”, Journal of Philosophical Logic 22, 169–192.
[32] Dubois, D. and Prade, H., 1996: “Non-Standard Theories of Uncertainty in
Plausible Reasoning”, in: G. Brewka (1996), 1–32.
[33] Edgington, D., 1995: “On Conditionals”, Mind 104, 235–329.
[34] Edgington, D., 1996: “Vagueness by Degrees”, in: R. Keefe and P. Smith
(eds.), Vagueness: A Reader, Cambridge: MIT Press, 617–630.
[35] Fagin, R., 1976: “Probabilities on Finite Models”, Journal of Symbolic
Logic 41, 50–58.
[36] Fagin, R. and Halpern, J.Y., 1994: “Reasoning about Knowledge and Prob-
ability”, Journal of the ACM 41, 340–367.
[37] Fagin, R., Halpern, J.Y., and Megiddo, N., 1990: “A Logic for Reasoning
About Probabilities”, Information and Computation 87, 78–128.
[38] Fenstad, J.E., 1967: “Representations of Probabilities Defined on First Or-
der Languages”, in: J.N. Crossley (ed.), Sets, Models and Recursion The-
ory, Amsterdam: North-Holland, 156–172.
[39] Field, H., 1977: “Logic, Meaning, and Conceptual Role”, The Journal of
Philosophy 74, 379–409.
43
[40] Field, H., 2009: “What is the Normative Role of Logic?”, Proceedings of
the Aristotelian Society Supplementary Volume LXXXIII, 251–268.
[41] Foley, R., 1993: Working Without a Net, Oxford: Oxford University Press.
[42] Van Fraassen, B., 1981: “Probabilistic Semantics Objectified: I. Postulates
and Logics”, Journal of Philosophical Logic 10, 371–394.
[43] Van Fraassen, B., 1995: “Belief and the Problem of Ulysses and the
Sirens”, Philosophical Studies 77, 7–37.
[44] Frisch, A.M. and Haddawy, P., 1988: “Probability as a Modal Operator”,
in: Proceedings of the 4th Workshop on Uncertainty in AI, Minneapolis,
MN, 109–118.
[45] Frisch, A.M. and Haddawy, P., 1994: “Anytime Deduction for Probabilistic
Logic”, Artificial Intelligence 69, 93–122.
[46] Fuhrmann, A., 1997: An Essay on Contraction, Stanford: CSLI Publica-
tions.
[47] Gabbay, D.M., 1984: “Theoretical Foundations for Non-Monotonic Rea-
soning in Expert Systems”, in: K.R. Apt (ed.), Logics and Models of Con-
current Systems, Berlin: Springer, 439–458.
[48] Gabbay, D.M., Hogger, C.J., and Robinson, J.A. (eds.), 1994: Handbook of
Logic in Artificial Intelligence and Logic Programming 3, Oxford: Claren-
don Press, 35–110.
44
[49] Gaifman, H., 1964: “Concerning Measures in First Order Calculi”, Israel
Journal of Mathematics 2, 1–18.
[50] Gaifman, H. and Snir, M., 1982: “Probabilities Over Rich Languages, Test-
ing and Randomness”, The Journal of Symbolic Logic 47, 495–548.
[51] Gaifman, H., 1986: “A Theory of Higher Order Probabilities”, in: Proceed-
ings of the Conference on Theoretical Aspects of Reasoning about Knowl-
edge, Monterey: California, 275–292.
[52] Gardenfors, P., 1975: “Qualitative Probability as an Intensional Logic”,
Journal of Philosophical Logic 4, 171–185.
[53] Gardenfors, P. and Makinson, D., 1994: “Nonmonotonic Inference Based
on Expectations,” Artificial Intelligence 65, 197–245.
[54] Goldszmidt, M. and Pearl, J., 1996: “Qualitative Probabilities for Default
Reasoning, Belief Revision, and Causal Modeling”, Artificial Intelligence
84, 57–112.
[55] Haenni, R., 2005: “Unifying Logical and Probabilistic Reasoning”, in:
L. Godo (ed.), Symbolic and Quantitative Approaches to Reasoning with
Uncertainty, Lecture Notes in Artificial Intelligence Vol. 3571, Berlin:
Springer, 788–799.
[56] Hajek, A., 1989: “Probabilities of Conditionals–Revisited”, Journal of
Philosophical Logic 18, 423–428.
45
[57] Hailperin, T., 1937: “Foundations of Probability in Mathematical Logic”,
Philosophy of Science 4, 125–150.
[58] Hailperin, T., 1984: “Probability Logic”, Notre Dame Journal of Formal
Logic 25, 198–212.
[59] Hailperin, T., 1996: Sentential Probability Logic, Bethlehem, PA: Lehigh
University Press.
[60] Hailperin, T., 2000: “Probability Semantics for Quantifier Logic”, Journal
of Philosophical Logic 29, 207–239.
[61] Halpern, J.Y., 1990: “An Analysis of First-Order Logics of Probability”,
Artificial Intelligence 46, 311–350.
[62] Halpern, J.Y., 1991: “The Relationship between Knowledge, Belief, and
Certainty”, Annals of Mathematics and Artificial Intelligence 4, 301–322.
[63] Halpern, J.Y., 2001: “Lexicographic Probability, Conditional Probability,
and Nonstandard Probability”, in: Proceedings of the Eighth Conference
on Theoretical Aspects of Rationality and Knowledge. Ithaca, NY: Morgan
Kaufmann, 17–30.
[64] Halpern, J.Y., 2003: Reasoning About Uncertainty, Cambridge, Mass.: The
MIT Press.
[65] Halpern, J.Y., Rabin, M.O., 1987: “A Logic to Reason about Likelihood”,
Artificial Intelligence 32, 379–405.
46
[66] Hamblin, C.L., 1959: “The Modal ’Probably”’, Mind 68, 234–240.
[67] Hawthorne, J., 1996: “On the Logic of Nonmonotonic Conditionals and
Conditional Probabilities”, Journal of Philosophical Logic 25, 185–218.
[68] Hawthorne, J., 2007: “Nonmonotonic Conditionals that Behave Like Con-
ditional Probabilities Above a Threshold”, Journal of Applied Logic 5,
625–637.
[69] Hawthorne, J. and Makinson, D., 2007: “The Quantitative/Qualitative Wa-
tershed for Rules of Uncertain Inference”, Studia Logica 86, 247–297.
[70] Heifetz, A. and Mongin, P., 2001: “Probability Logic for Type-Spaces”,
Games and Economic Behavior 35, 31–53.
[71] Hempel, C.G., 1962: “Deductive-Nomological vs Statistical Explanation”,
in: H. Feigl and G. Maxwell (eds.), Minnesota Studies in the Philosophy of
Science III, Minneapolis: University of Minnesota Press, 98–169.
[72] Hilpinen, R., 1968: Rules of Acceptance and Inductive Logic, Acta Philo-
sophical Fennica 22, Amsterdam: North-Holland.
[73] Hintikka, J. and Suppes P. (eds.), 1966: Aspects of Inductive Logic, Ams-
terdan: North-Holland.
[74] Hoover, D. N., 1978: “Probability Logic”, Annals of Mathematical Logic
14, 287–313.
47
[75] Howson, C., 2003: “Probability and Logic”, Journal of Applied Logic 1,
151–165.
[76] Huber, F. and Schmidt-Petri, C. (eds.), 2009: Degrees of Belief, Springer,
Synthese Library 342.
[77] Johnson, M. and Parikh, R., 2008: “Probabilistic Conditionals are Almost
Monotonic”, Review of Symbolic Logic 1, 73–80.
[78] Keisler, H. J., 1985: “Probability Quantifiers”, in: J. Barwise and S. Fefer-
man (eds.), Model-Theoretic Logics, New York: Springer, 509–556.
[79] Kooi, B.P., 2003: “Probabilistic Dynamic Epistemic Logic”, Journal of
Logic, Language and Information 12, 381–408.
[80] Kraus, S., Lehmann, D., and Magidor, M., 1990: “Nonmonotonic Reason-
ing, Preferential Models and Cumulative Logics”, Artificial Intelligence 44,
167–207.
[81] Kyburg, H.Jr., 1961: Probability and the Logic of Rational Belief, Middle-
town: Wesleyan University Press.
[82] Lando, T., 2010: “Completeness of S4 for the Lebesgue measure algebra”,
Journal of Philosophical Logic 41, 287–316.
[83] Leblanc, H., 1979: “Probabilistic Semantics for First-Order Logic”,
Zeitschrift fur mathematische Logik und Grundlagen der Mathematik 25,
497–509.
48
[84] Leblanc, H., 1983: “Alternatives to Standard First-Order Semantics”, in:
D. Gabbay and F. Guenthner (eds.), Handbook of Philosophical Logic, Vol-
ume I, Dordrecht: Reidel, 189–274.
[85] Lehmann, D. and Magidor, M., 1992: “What Does a Conditional Knowl-
edge Base Entail?”, Artificial Intelligence 55, 1–60.
[86] Leitgeb, H., 2004: Inference on the Low Level. An Investigation into De-
duction, Nonmonotonic Reasoning, and the Philosophy of Cognition, Dor-
drecht: Kluwer, Applied Logic Series.
[87] Leitgeb, H., 2012a: “A Probabilistic Semantics for Counterfactuals. Part
A”, Review of Symbolic Logic 5, 16–84.
[88] Leitgeb, H., 2012b: “A Probabilistic Semantics for Counterfactuals. Part
B”, Review of Symbolic Logic 5, 85–121.
[89] Leitgeb, H., 2012c: “From Type-Free Truth to Type-Free Probability”, in:
G. Restall and G. Russell (eds.), New Waves in Philosophical Logic, New
York: Palgrave Macmillan, 84–93.
[90] Leitgeb, H., 2014: “The Stability Theory of Belief”, The Philosophical
Review 123, 131–171.
[91] Levi, I., 1967: Gambling with the Truth. An Essay on Induction and the
Aims of Science, Cambridge, Mass.: The MIT Press.
[92] Lewis, D., 1973: Counterfactuals, Oxford: Basil Blackwell.
49
[93] Lewis, D.K., 1976: “Probabilities of Conditionals and Conditional Proba-
bilities”, The Philosophical Review 85: 297–315. Reprinted in D.K. Lewis,
Philosophical Papers, Vol. II. Oxford: Oxford University Press, 1986, 133–
156.
[94] Lewis, D.K., 1980: “A Subjectivists Guide to Objective Chance”, in: R.
Jeffrey (ed.), Studies in Inductive Logic and Probability, Vol II., Berkeley:
University of California Press, 263–293. Reprinted in D.K. Lewis, Philo-
sophical Papers, Vol. II, Oxford: Oxford University Press, 1986, 83–132.
[95] Lin, H. and Kelly, K.T., 2012: “Propositional Reasoning that Tracks Prob-
abilistic Reasoning”, Journal of Philosophical Logic 41, 957–981.
[96] Maher, P., 1993: Betting on Theories, Cambridge: Cambridge University
Press.
[97] Makinson, D., 1989: “General Theory of Cumulative Inference”, in: M.
Reinfrank et al. (eds.), Non-Monotonic Reasoning, Lecture Notes on Arti-
ficial Intelligence, vol. 346, Berlin: Springer, 1–18.
[98] Makinson, D., 1994: “General Patterns in Nonmonotonic Reasoning”, in:
Gabbay et al. (1994), 35–110.
[99] Makinson, D., 2011: “Conditional Probability in the Light of Qualitative
Belief Change”, Journal of Philosophical Logic 40, 121–153.
[100] Makinson, D., 2012: “Logical Questions behind the Lottery and Preface
Paradoxes: Lossy Rules for Uncertain Inference”, Synthese 186, 511–529.
50
[101] McGee, V., 1989: “Conditional Probabilities and Compounds of Condi-
tionals”, The Philosophical Review 98, 485–541.
[102] Morgan, C., 1982: “Simple Probabilistic Semantics for Modal Logic”,
Journal of Philosophical Logic 11, 443–458.
[103] Nilsson, N., 1986: “Probabilistic Logic”, Artificial Intelligence 28, 71–87.
[104] Paris, J. and Simmonds, R., 2009: “O Is Not Enough”, Review of Symbolic
Logic 2, 298–309.
[105] Paris, J., 2011: “Pure Inductive Logic”, in: L. Horsten and R. Pettigrew
(eds.), The Continuum Companion to Philosophical Logic, London: Con-
tinuum, 428–449.
[106] Pearl, J., 1988: Probabilistic Reasoning in Intelligent Systems, San Mateo:
Morgan Kaufmann.
[107] Pearl, J. and Goldszmidt, M., 1996: “Probabilistic Foundations of Qualita-
tive Reasoning with Conditional Sentences”, in: G. Brewka (1996), 33–68.
[108] Pfeifer, N. and Kleiter, G.D., 2005: “Coherence and Nonmonotonicity in
Human Reasoning”, Synthese 146, 93–109.
[109] Pfeifer, N. and Kleiter G.D., 2010: “The Conditional in Mental Probability
Logic”, in: M. Oaksford and N. Chater (eds.), Cognition and Condition-
als: Probability and Logic in Human Thought, Oxford: Oxford University
Press, 153–173.
51
[110] Popper, K.R., 1955: “Two Autonomous Axiom Systems for the Calculus
of Probabilities”, British Journal for the Philosophy of Science 6, 51–57.
[111] Ramsey, F.P., 1926: “Truth and Probability”, in: F.P. Ramsey, The Founda-
tions of Mathematics and other Logical Essays, edited by R.B. Braithwaite,
London: Kegan Paul, 1931, 156–198.
[112] Rescher, N., 1962: “A Probabilistic Approach to Modal Logic”, Acta Philo-
sophica Fennica 16, 215–226.
[113] Richardson, M. and Domingos, P., 2006: “Markov Logic Networks”, Ma-
chine Learning 62, 107–136.
[114] Roeper, P. and Leblanc, H., 1999: Probability Theory and Probability Se-
mantics, Toronto: University of Toronto Press.
[115] Ross, J. and Schroeder, M., forthcoming: “Belief, Credence, and Pragmatic
Encroachment”, Philosophy and Phenomenological Research.
[116] Schurz, G., 1997: “Probabilistic Default Logic Based on Irrelevance and
Relevance Assumptions”, in: D.M. Gabbay et al. (eds.), Qualitative and
Quantitative Practical Reasoning, Berlin: Springer, 536–553.
[117] Schurz, G., 1998: “Probabilistic Semantics for Delgrande’s Conditional
Logic and a Counterexample to his Default Logic”, Artificial Intelligence
102, 81–95.
52
[118] Schurz, G., 2001: “What is ‘Normal’? An Evolution-Theoretic Foundation
of Normic Laws and Their Relation to Statistical Normality”, Philosophy
of Science 68, 476–497.
[119] Scott, D. and Krauss, P., 1966: “Assigning Probabilities to Logical Formu-
las”, in: Hintikka and Suppes (1966), 219–264.
[120] Segerberg, K., 1971: “Qualitative Probability in a Modal Setting”, in: J.E.
Fenstad (ed.), Proc. of the 2nd Scand. Log. Symp., Amsterdam: North-
Holland, 341–352.
[121] Snow, P., 1999: “Diverse Confidence Levels in a Probabilistic Semantics
for Conditional Logics”, Artificial Intelligence 113, 269–279.
[122] Speranski, S.O., 2013: “Complexity for Probability Logic with Quantifiers
over Propositions”, Journal of Logic and Computation 23, 1035–1055.
[123] Spohn, W., 1988: “Ordinal Conditional Functions: A Dynamic Theory of
Epistemic States”, in: W.L. Harper, B. Skyrms (eds.), Causation in Deci-
sion, Belief Change, and Statistics, 2, Dordrecht: Reidel, 105–134.
[124] Spohn, W., 2012: The Laws of Belief: Ranking Theory and Its Philosophi-
cal Applications, Oxford: Oxford University Press.
[125] Stalnaker, R.C., 1968: “A Theory of Conditionals”, in: N. Rescher (ed.),
Studies in Logical Theory, Blackwell, 98–112.
53
[126] Stalnaker, R.C., 1970: “Probability and Conditionals”, Philosophy of Sci-
ence 37: 64–80.
[127] Sturgeon, S. 2008: “Reason and the Grain of Belief”, Nous 42, 139–165.
[128] Suppes, P., 1966: “Probabilistic Inference and the Concept of Total Evi-
dence”, in: Hintikka and Suppes (1966), 49–65.
[129] Swain, M. (ed.), 1970: Induction, Acceptance and Rational Belief, Dor-
drecht: Reidel, 55–82.
[130] Terwijn, S.A., 2005: “Probabilistic Logic and Induction”, Journal of Logic
and Computation 15, 507–515.
[131] Wedgwood, R., 2012: “Outright Belief”, Dialectica 66(3), 309–329.
[132] Wheeler, G., 2007: “A Review of the Lottery Paradox”, in: W.L. Harper
and G. Wheeler (eds.), Probability and Inference: Essays in Honor of
Henry E. Kyburg Jr., London: Kings College Publications, 1–31.
[133] Yalcin, S., 2010: “Probability Operators”, Philosophy Compass 5, 916–
937.
54