probability in logic

54
Probability in Logic Hannes Leitgeb This chapter is about probabilistic logics: systems of logic in which logical consequence is defined in probabilistic terms. We will classify such systems and state some key references, and we will present one class of probabilistic logics in more detail: those that derive from Ernest Adams’ work. 1 Probability in Logic Logic and probability have long been studied jointly: Boole (1854) is a classi- cal example. If ‘logic’ is understood in suciently broad terms, then probability theory might even be subsumed under logic (as a discipline). In the words of Ram- sey (1926, p.157): “the Theory of Probability is taken as a branch of logic”. John Maynard Keynes and Edwin Thompson Jaynes held similar views, and variants of the view were defended more recently, e.g., by Howson (2003) and Haenni (2005). In that sense, the probabilistic explication of the confirmation of hypotheses (as initiated by Carnap 1950) may, for example, be regarded as a kind of probabilistic (or inductive) logic; see the chapter on “Confirmation Theory” in this volume. On 1

Upload: lmu-munich

Post on 03-Feb-2023

1 views

Category:

Documents


0 download

TRANSCRIPT

Probability in Logic

Hannes Leitgeb

This chapter is about probabilistic logics: systems of logic in which logical

consequence is defined in probabilistic terms. We will classify such systems and

state some key references, and we will present one class of probabilistic logics in

more detail: those that derive from Ernest Adams’ work.

1 Probability in Logic

Logic and probability have long been studied jointly: Boole (1854) is a classi-

cal example. If ‘logic’ is understood in sufficiently broad terms, then probability

theory might even be subsumed under logic (as a discipline). In the words of Ram-

sey (1926, p.157): “the Theory of Probability is taken as a branch of logic”. John

Maynard Keynes and Edwin Thompson Jaynes held similar views, and variants of

the view were defended more recently, e.g., by Howson (2003) and Haenni (2005).

In that sense, the probabilistic explication of the confirmation of hypotheses (as

initiated by Carnap 1950) may, for example, be regarded as a kind of probabilistic

(or inductive) logic; see the chapter on “Confirmation Theory” in this volume. On

1

the other hand, if used in such a broad manner, the label ‘probabilistic logic’ is no

longer particularly informative as far as its ‘logic’ component is concerned.

In this chapter, we will restrict the term ‘logic’ to logic proper: a logic or

logical system is a triple of the form 〈L, �, `〉, where (i)L is a formal language, (ii)

� is a semantically (model-theoretically) specified relation of logical consequence

defined for the members ofL, (ii) ` is a proof-theoretically (in terms of axioms and

rules) specified relation of deductive consequence for the members of L. Ideally,

` is sound with respect to � (that is, the extension of the ` relation is a subset of

the extension of the � relation), and ` is complete with respect to � (the extension

of the � relation is a subset of the extension of the ` relation). However, not every

logical system will satisfy both of these properties. Logic qua discipline is then

the area in which logics in this sense are defined and in which they are studied

systematically.

Now consider a logic in such a sense of the word: call the formal language

L for which � and ` are defined the ‘object language’, and call the language in

which � and ` are defined the ‘metalanguage’. We can then define probabilistic

logics to be precisely those logics 〈L, �, `〉 for which the definition of � involves

reference to, or quantification over, probability measures (which are then usually

defined for the formulas in L or for subformulas thereof). And the area in which

probabilistic logics in this sense are specified and described is probabilistic logic

as a discipline. Probabilistic logic in this sense is the topic of this chapter. Refer-

ence to, or quantification over, probability measures on the metalevel will thus be

a given in anything that follows.

2

For instance, in section 3 of this chapter we will consider a definition of logical

consequence for object language conditionals that will look like this:

• We say that

{ϕ1 ⇒ ψ1, . . . , ϕn ⇒ ψn} � α⇒ β

iff for all ε > 0 there is a δ > 0, such that for all probability measures P:

if for all ϕi ⇒ ψi it holds that P(ψi|ϕi) > 1 − δ, then P(β|α) > 1 − ε.

All of the technical details concerning this definition will be explained in section

3. What is relevant right now is just that this is a typical example of specifying a

probabilistic logic: for the logical consequence relation � is determined semanti-

cally by quantifying over probability measures (“for all probability measures P”).

As we are going to see, research in probabilistic logic even in this restrictive

sense still cuts through various disciplines: theoretical computer science, artifi-

cial intelligence, cognitive psychology, and philosophy—especially, philosophical

logic, philosophy of science, formal epistemology, and philosophy of language.

However, the focus of this chapter will be on those aspects of probabilistic logic

that seem most relevant philosophically. For instance, the logical consequence re-

lation defined above extends nicely to a logico-philosophical theory of indicative

conditionals in natural language, as developed by Adams (1975).

The rest of this chapter is organized as follows. In section 2 we will turn to

a classification of probabilistic logics along two dimensions: (1.) those which

do not involve reference to, nor quantification over, probability measures on the

object level, and (2.) those which do. And within the second class, we will dis-

3

tinguish between (a.) probabilistic logics which do involve explicit reference to,

or quantification over, probability measures on the object level, and (b.) those for

which this kind of reference or quantification remains implicit. We will regiment

some of the essential references on probabilistic logic into the resulting simple

classification system. When doing so, we will refrain from going into formal de-

tails. And our selection of references will be, obviously, biased and incomplete.

Section 3 will then be devoted to a concrete, and formally more detailed, case

study of probabilistic logic(s): Ernest Adams’ logic of high probability condition-

als and some of its close relatives and variations. We have chosen this example

because it may safely be called the most influential, and probably also the most in-

novative, instance of probability in logic in the philosophical corner of the subject

matter. We will present six types of semantics for high probability conditionals in

that section. Although these semantics will look different initially, and although

they are based on different motivations, they will be seen to determine one and the

same deductive system of axioms and rules for conditionals: Adams’ system P.

Section 3 will be based partially on material from chapters 9-11 in Leitgeb (2004)

(albeit with substantial revisions).

Some final remarks before we start classifying probabilistic logics: First of

all, in probabilistic logic, probability measures are typically defined on formulas

rather than on sets (events) as standard probability theory would have it. We will

state the essentials of such probability measures for formulas in section 3, but only

for the very restrictive case of the language of propositional logic. For the exten-

sion to first-order languages, see Gaifman (1964), Scott and Kraus (1966), Fenstad

4

(1967), Fagin (1976), Gaifman and Snir (1982), Nilsson (1986), Richardson and

Domingos (2006) (for probabilities of first-order formulas as given by Markov

networks), and (as far as inductive logic is concerned) Paris (2011), which may

all be taken to develop probabilistic model theories for first-order formulas. Stan-

dard models or truth evaluations for formulas are thereby replaced by probability

measures, and truth values for formulas are replaced by probabilities.

Secondly, throughout this chapter, the underlying base logic for probability

measures will be assumed to be classical. E.g., one of the axioms for probability

measures on formulas (see section 3) will demand the probability of all classical

tautologies to be 1. But there are also probability measures for which classical

logic is not presupposed in this way: see the chapter on “Probability and Non-

Classical Logic” in this volume for more information.

Thirdly, probability measures can be interpreted in different ways: as an agent’s

rational degrees of belief in propositions, as objective non-epistemic chances of

the occurrence of physical events, as statistical ideal long-term frequencies of

properties applying to individuals in a certain domain, and more. Mostly, prob-

abilistic logics are open to different such interpretations simultaneously, which

is why we will not deal with the topic of interpreting probabilities very much,

even though one of these interpretations is usually put forward as the “intended”

such interpretation (and in most cases that intended interpretation is the subjective

“Bayesian” one in terms of rational degrees of belief).

Fourthly, by turning our attention to logical consequence relations on formal

object languages, we put to one side all theories that combine aspects of logic and

5

probability in a different manner. In particular, there is a substantial literature on

how to combine a logical account of (all-or-nothing) belief or acceptance with a

probabilistic account of numerical degrees of belief, starting with Kyburg (1961),

Hempel (1962), Levi (1967), Hilpinen (1968), and Swain (1970), through the

more recent literature (for overviews see Foley 1993, Maher 1993, Christensen

2004, Huber and Schmidt-Petri 2009) to the most recent of such theories (e.g.,

Hawthorne and Makinson 2007, Sturgeon 2008, Wedgwood 2012, Lin and Kelly

2012, Leitgeb 2014, Buchak, forthcoming, Ross and Schroeder, forthcoming).

Typically, these theories do not use formal languages (in the sense of formal logic)

when stating the logical closure properties of belief, or the probabilistic axioms

for degrees of belief, or principles of how belief relates to degrees of belief. Nor

do they aim to define logical consequence relations for formal languages in prob-

abilistic terms; which is why we will not cover these theories in this chapter.

2 The Classification of Probabilistic Logics

By our definition from the last section, a probabilistic logic 〈L, �, `〉 includes a

logical consequence relation � that is specified on the metalevel by referring to, or

quantifying over, probability measures.

The first main decision point for probabilistic logics concerns the question

of whether or not such a logic also involves reference to, or quantification over,

probability measures on the object level:

1. Probabilistic logics which do not involve reference to, nor quantification

6

over, probability measures on the object level:

These are logical systems in which � is defined in probabilistic terms, but

where the object language L itself (such as, e.g., the language of proposi-

tional logic) is not expressive enough to ascribe probabilities to formulas.

One group of references in this category emerges from Popper (1955) who

axiomatized primitive conditional probability measures for formulas au-

tonomously from logic, that is, without presupposing (meta-)logical con-

cepts such as tautology or logical consequence in the axioms for condi-

tional probability themselves. Such conditional probability measures are

not defined in terms of ratios of unconditional probabilities, as standard

probability theory has it. That is why they can allow for a conditional

probability P(β|α) to be defined even when P(α) = 0 (see Halpern 2001,

Makinson 2011, and chapter “Conditional Probability” in this volume for

an overview). Although logical concepts are not used in their definition,

these measures still end up being based on classical logic due the manner

in which their axioms are set up. Indeed, in turn, it becomes possible now

to define logical concepts, such as the relation of logical consequence, for

the language of propositional logic in purely probabilistic terms. Popper’s

corresponding probabilistic account of logical consequence was extended

later also to first-order languages by Field (1977), Leblanc (1979), van

Fraassen (1981), and Roeper and Leblanc (1999), and to languages with

modalities by Morgan (1982) and Cross (1993). E.g., as far as the lan-

guage L of propositional logic is concerned, Field suggests defining logical

7

consequence probabilistically in the following manner: α1, . . . , αn � β if

and only if for every primitive conditional probability measure P on L that

satisfies Popper’s axioms, and for all formulas γ, it holds that P(β|γ) >

P(α1 ∧ . . . ∧ αn|γ). Leblanc (1983) gives a simpler definition of conse-

quence in terms of unconditional probabilities (which can be defined from

conditional probabilities): α1, . . . , αn � β if and only if for every probability

measure P on L, if for every αi it holds that P(αi) = 1, then also P(β) = 1.

The second group of references in this category has its source in Suppes

(1966) who studied to what extent the probability of the conclusion of a

logically valid argument may fall below the probabilities of the premises

of the argument. It is easy to see that Suppes’ observations can be turned

into a probabilistic definition of � for the language L of propositional logic,

as worked out in detail by Ernest Adams. Adams is also responsible for

extending the account to conditionals α ⇒ β with a new primitive connec-

tive ⇒ that is not definable by means of the connectives of propositional

logic and which one may take to express high conditional probability. We

will turn to Adams’ work on high probability conditionals in more detail in

section 3, but as far as the language L of propositional logic is concerned,

logical implication for L may be defined probabilistically in the Suppes-

Adams style as follows: α1, . . . , αn � β if and only if for every probability

measure P on L it holds that P(β) > 1 − n + P(α1) + . . . + P(αn). This con-

sequence relation can then be shown to coincide extensionally with that of

classical logic. Here is an example of how this result can be applied: since

8

α1, α2 � α1 ∧ α2 in classical logic, it follows from applying the left-to-right

direction of the equivalence above to the case of n = 2 that if P(α1) > 1 − ε

and P(α2) > 1 − ε, then P(α1 ∧ α2) > 1 − 2ε (and one can also show that

this lower bound cannot be improved, unless additional information on the

logical structure of α1 and α2 is available).

Now we turn to systems of probabilistic logic the object languages which are

expressive enough to ascribe probabilities to formulas.

2. Probabilistic logics which do involve reference to, or quantification over,

probability measures on the object level:

The first class of such probabilistic logics concerns object languages that

allow for ascribing probabilities to formulas explicitly:

(a) Probabilistic logics which do involve explicit reference to, or quantifi-

cation over, probability measures on the object level:

By ‘explicit’ we mean: Either a sentential probabilistic operator or a

probabilistic generalized quantifier is applied to a formula, or alter-

natively a probabilistic function symbol is applied to (the name of)

a formula. The result of these applications is then combined some-

how with expressions of the form ‘= r’, ‘> r’, or the like, where

‘r’ is a numeral denoting a real number in the unit interval. This

leads to probabilistic formulas such as: ‘P(α) = r’ (“the probabil-

ity of α is r”) or ‘P(α) > r’ (“the probability of α is greater than or

9

equal to r”) or the like. A probability measure P can be said to sat-

isfy such a formula, if interpreting the symbol ‘P’ by the measure P

yields a true statement. Finally, logical consequence relations are de-

fined for formal languages that include probabilistic formulas of such

types. Usually, this is achieved by defining consequence in terms of

truth preservation in all probability models for the object languages in

question. Roughly: α1, . . . , αn � β if and only if for every probability

measure P on L, if P satisfies each of α1, . . . , αn, then P satisfies β.

And deductive consequence relations may be defined which are then

proven sound and, where possible, complete with respect to logical

consequence. In a nutshell: this part of probabilistic logic deals with

formalizations of the language of probability theory or various natu-

ral fragments thereof, such that the formalization pays off in terms of

an improved control over the relations of logical and deductive conse-

quence. This is in contrast with standard probability theory in which

the entailment relation is usually left merely informal and implicit.

For instance, one way of presenting Hailperin’s (1984, 1996, 2000)

probabilistic logics is for them to involve object languages in which

one can express that the probability of a formula is within a particular

interval of real numbers. However, it is not yet possible to express in

these languages the probabilities of probability statements themselves,

that is, so-called second-order or higher-order probabilities, as for ex-

ample ‘P(P(α) ∈ [r, s]) ∈ [r′, s′]’ (“the probability that the probability

10

that α is in the interval [r, s] is in the interval [r′, s′]”). Gaifman (1986)

is the classical source for the logical treatment of higher-order prob-

abilities, and most of the literature on probabilistic logics that allow

for higher-order probabilities builds on it. (Such higher-order proba-

bility statements can be varied in lots of ways, so that e.g. “outer” and

“inner” occurrences of probabilistic symbols are assigned distinct in-

terpretations; see e.g. Lewis 1980, van Fraassen 1995, Halpern 1991.)

Another landmark paper in that area is Fagin, Halpern, and Meggido

(1990), who essentially logically formalize Nilsson’s (1986) account

of probabilistic reasoning on formulas (which had not been stated with

a formal language as yet). Their object language is even more expres-

sive than what we dealt with before. E.g., in their language one can

say that the weighted sum of probabilities of finitely many formulas

is greater than or equal to a certain real number, as in a1P(α1) + . . . +

anP(αn) > r, where each of the αi may again include logical connec-

tives or P. The resulting language can also be extended to encompass

Boolean combinations of such inequalities, inequalities for conditional

probabilities, and first-order quantification of real numbers (based on

the first-order theory of real closed fields). The authors provide se-

mantic interpretations for these object languages. Relying on findings

from linear programming, they determine sound and complete axiom-

atizations of the corresponding logical consequence relations, and they

state NP-complete decision procedures for the correspondings satisfi-

11

ability problems.

Here are some further closely related probabilistic logics in the same

category: Frisch and Haddawy’s (1988) object language is less expres-

sive than Fagin, Halpern, and Meggido’s, although one can still say

that the probability of a formula is in a certain interval of real numbers,

and the corresponding probability operators can be nested again so that

also higher-order probabilities can be ascribed. On the logical side,

building on Gaifman’s (1986) work, the semantics is set up so that

Miller’s principle—a typical instance of a higher-order so-called prob-

abilistic reflection principle—is logically valid: P(α | P(α) ∈ [r, s]) ∈

[r, s] (“the conditional probability of α, given that the probability of α

is in the interval [r, s], is in the interval [r, s]”).

Heifetz and Mongin (2001) is another theory that employs a language

less expressive than Fagin, Halpern, and Maggido’s but it comes with

a special benefit: a less demanding fragment of arithmetic needs to

be built into the corresponding axiomatic system. Speranski (2013)

extends Fagin, Halpern, and Meggido’s account by adding also quan-

tifiers over propositions.

The object language of Bacchus’ (1990a, 1990b) probabilistic logic

is even more expressive than Fagin, Halpern, and Maggido’s, at least

as far as quantification is concerned, but there are also differences in

terms of interpretation: while Fagin, Halpern, and Maggido’s prob-

ability measures are most easily interpreted as expressing subjective

12

probabilities of closed sentences that determine sets of possible worlds,

Bacchus also considers probability measures which are best interpreted

as expressing statistical probabilities of open formulas that determine

ensembles of individuals in a domain. (See also Hoover 1978 and

Keisler 1985 for probabilistic logics with generalized quantifiers that

concern probabilities of sets of tuples of individuals.) Such subjec-

tive and statistical probability measures can also be combined and ex-

pressed in one and the same logical system, as developed in Halpern

(1990), Bacchus et al. (1996), and chapter 11 of Halpern (2003).

We should also mention some complexity results. We already men-

tioned Bacchus’ (1990a, 1990b) system: his axiomatic system is com-

plete with respect to models that are based on nonstandard probability

measures. (More will be said about nonstandard probability in section

3.) But now consider systems for subjective probability, or statistical

probability, or both combined, where the object language includes: at

least one probabilistic function symbol, the equality symbol, quanti-

fiers (and at least one individual constant symbol). For any system of

such type, Abadi and Halpern (1994) showed that if its set of logical

truths is determined by models that involve only standard probabil-

ity measures, then that set is not recursively axiomatizable anymore

(unless further syntactic restrictions are invoked).

Fagin and Halpern (1994) extend the theory of Fagin, Halpern, and

Meggido (1990) in a different direction, by adding epistemic opera-

13

tors such as for knowledge, and Kooi (2003) and van Benthem, Ger-

brandy, Kooi (2009) further extend the account by invoking dynamic

epistemic or probabilistic operators such as for knowledge change and

probability change.

Finally, originating from a very different background—formal theories

of truth and the study of semantic paradoxes (such as the famous Liar

paradox)—Leitgeb (2012) even allows for type-free probabilities: he

presents different systems of probabilistic logic in which probabilities

are ascribed to formulas that may speak about their own probabilities,

such as a formula α that is provably equivalent to P(α) < 1 (so that

α may be said to express: my probability is less than 1). Christiano

et al. (unpublished) present an alternative theory of type-free proba-

bility, and Caie (2013) gives reasons why one ought to be interested

philosophically in type-free probability in that sense.

(b) Probabilistic logics which only involve implicit reference to, or quan-

tification over, probability measures on the object level:

Whereas the previous category of probabilistic logics concerned ways

of expressing probabilities on the usual numerical scale of concepts,

the probabilistic logics in this category typically involve expressions

for probabilities that merely occupy a categorical (all-or-nothing) or

ordinal (comparative) scale of concepts. In the words of Halpern and

Rabin (1987, 381): “probability theory is not the only way of reason-

ing about likelihood”. The relations of logical consequence for the cor-

14

responding object languages are either defined in terms of truth preser-

vation again—over probability measures themselves or over possible

worlds models that are given a probabilistic interpretation—or they

are not defined in terms of satisfaction at all. We already encountered

Adams’ example of the latter kind in section 1, and we will return to

this in more detail in section 3.

One group of such logical systems concerns formal languages with

an ‘it is (highly) probable that’ operator. With it, one is able to ex-

press that P(α) > r (or maybe instead P(α) > r) for a fixed real num-

ber threshold 12 < r < 1 that is not denoted explicitly in the object

language. Hamblin (1959) was probably the first to study this (but

still without iterations of the probability operator). Burgess (1969)

presents a semantics for such an operator (in the strictly-greater-than

version). He also presents sound, but not complete, axiomatizations of

the corresponding logic even in the case in which nestings of the oper-

ator are allowed. So does Arlo-Costa (2005) who suggests a neighbor-

hood semantics for the operator. And Burgess (1969) gives decision

procedures for the set of logically true (valid) formulas and the set of

satisfiable formulas relative a given threshold 12 6 r < 1.

All of these logics are characterized by ‘it is probable that α’ and ‘it is

probable that β’ failing to entail jointly ‘it is probable that α ∧ β’, in

line with the fact that the probability of a conjunction may fall below

that of its conjuncts (as exemplified nicely in Kyburg’s famous Lottery

15

Paradox—see Wheeler 2007 for an overview). This is clearly in con-

trast with normal systems of modal logic, which are based a possible

worlds semantics rather than a neighborhood semantics. For according

to them, ‘it is necessary that α’ and ‘it is necessary that β’ do jointly

entail ‘it is necessary that α ∧ β’.

Halpern and Rabin (1987) develop yet another, though somewhat dif-

ferent, logical system for such an ‘it is (highly) probable that operator’.

And Terwijn (2005) studies a probabilistic logic the object language

of which is that of first-order logic but where the truth condition of

universally quantified formulas is given by a probabilistic threshold

condition again.

The second group of references in this category concerns logics for

an ‘it is probabilistically certain that’ operator by which one can ex-

press that P(α) = 1. The system of Rescher (1962), in which the box

or “necessity” operator is interpreted in such a probabilistic manner,

is an early example (but see also Hailperin 1937). The modal logic

and semantics that emerge from this interpretation correspond to that

of the standard modal system S5, in which nestings of the new oper-

ator are allowed and where ‘it is probabilistically certain that α’ and

‘it is probabilistically certain that β’ do jointly entail ‘it is probabilis-

tically certain that α ∧ β’. Of course, this is just as it should be, as the

axioms of probability do imply that the probability of a conjunction

is 1 if the probability of each of its conjuncts is. Lando (2010) is a

16

different, and more recent, example of a normal modal logic (in her

case, S4) in which the box operator gets assigned a measure-theoretic

interpretation (though a different one than Rescher’s).

The next two groups of logical systems in the present class are exten-

sions of the first and the second group, respectively, to probabilistic

conditional operators⇒ in the object language, or to binary so-called

nonmonotonic consequence relations |∼ that are expressed metalin-

guistically but which may be viewed as corresponding to sets or theo-

ries of probabilistic object-linguistic conditionals closed under certain

rules. In particular, Hawthorne (1996, 2007), Hawthorne and Makin-

son (2007), and Makinson (2012) study relations |∼ between formulas

in the language of propositional logic, such that α |∼ β if and only if

P(β|α) > r (or P(α) = 0). Here, P is again a given probability measure

P, and r is a given real number threshold, so that 12 < r < 1, and the

threshold is again not denoted explicitly in the object language. Arlo-

Costa and Parikh (2005) also determine nonmonotonic consequence

relations probabilistically but they do so for the probability 1 case,

such that α |∼ β if and only if P(β|α) = 1. However, in their case P

is assumed to be a primitive conditional probability measure as dis-

cussed briefly in the context of Popper’s work in our first category of

17

probabilistic logics from above. While

α |∼ β, α |∼ γ

α |∼ (β ∧ γ)(And)

is logically valid in Arlo-Costa and Parikh’s system, it is invalid in

Hawthorne and Makinson’s system. If Arlo-Costa and Parikh’s logic

for nonmonotonic consequence relations is reconstructed as a logic

for conditionals—so that α ⇒ β expresses in the object language that

α |∼ β (or P(β|α) = 1) holds as expressed in the metalanguage—

then the resulting logical consequence relation � for such conditionals

is monotonic again, and it can be axiomatized in a sound and com-

plete manner in terms of (Adams’) logical system P in section 3 be-

low. (For this to be the case it is crucial that Arlo-Costa and Parikh

assume ‘P’ to refer to a primitive conditional probability measure that

satisfies Popper’s axioms.) And if the logic of Hawthorne and Makin-

son is reconstructed as a logic of conditionals in a similar manner,

then, metaphorically speaking, its axiomatization can be seen as the

result of “subtracting” the And rule above from the system P in sec-

tion 3. However, it turns out to be quite difficult to state a sound and

complete axiomatization for the logical consequence relation that is

wanted. That relation � is given semantically by:

• {ϕ1 ⇒ ψ1, . . . , ϕn ⇒ ψn} � α ⇒ β iff for all P, for all r ∈ [0, 1]:

if for all ϕi ⇒ ψi it holds that P(ψi|ϕi) > r (or P(ϕi) = 0), then

18

P(β|α) > r (or P(ϕ) = 0).

Hence, logical consequence corresponds to probability preservation

above a threshold. Hawthorne and Makinson (2007) conjectured that

their deductive system O was sound and complete (for Horn rules with

finitely many premises). However, Paris and Simmonds (2009) proved

it to be incomplete, while the infinite system of rules that Paris and

Simmonds ultimately did prove to be sound and complete is highly

complicated and not very intuitive.

Adams’ logic of high probability conditionals to which we will turn in

more detail in the next section lies somewhere in between Hawthorne

and Makinson’s and Arlo-Costa and Parikh’s accounts: the And rule

that was mentioned above is logically valid in Adams’ system (if stated

for conditionals), while Adams’ intended interpretation of α ⇒ β is

that the conditional probability of β given α is high, but not necessarily

equal to 1. The difference to Hawthorne and Makinson’s interpretation

consists in the corresponding threshold that defines the term ‘high’ to

remain only vaguely determined: no fixed real number threshold is

intended to be “the” correct one.

Finally, there are probabilistic logics which also belong to the present

category but which represent probability measures in the object lan-

guage differently from the logics discussed so far. E.g., Segerberg

(1971) and Gardenfors (1975) study logical systems with an ‘is at

least as probable as’ operator by which the laws of so-called quali-

19

tative probability (which originated with Bruno de Finetti) can be ex-

pressed in logical terms. Baltag and Smets (2008) develop logics for

dynamic operators that represent probabilistic update in the object lan-

guage. Yalcin (2010) presents a nice survey on probabilistic operators

of various kinds from a philosophical and linguistic point of view. Yal-

cin’s paper also includes further relevant references to logical studies

of probabilistic operators on a categorical or ordinal scale.

3 A Case Study: Probabilistic Logics for High Prob-

ability Conditionals

In this final section we will study six distinct but extensionally equivalent se-

mantics for “high probability” conditionals, which all derive from Ernest Adams’

work. Afterwards, we will turn to their axiomatic treatment.

3.1 Semantics for High Probability Conditionals

We need some preliminary definitions, before we can state the different versions

of probability semantics for high probability conditionals.

First of all, throughout the following two subsections, let L be the formal

language of standard propositional logic, except thatL is restricted to only finitely

many propositional variables p1, . . . , pn. As far as logical symbols are concerned,

we assume the standard connectives of classical propositional logic to be included

20

in the vocabulary of L: ¬, ∧, ∨,→ (for the material conditional),↔ (for material

equivalence). SoL contains formulas α such as ¬p1, p2 → (p3∨ p4), ¬(p5∧¬p5),

and the like (assuming that n > 5), as usual.

Secondly, let W be the (finite) set of all classical truth value assignments to

p1, . . . , pn. More briefly, we shall speak of W as the set of all logically possible

worlds over L, since each single member of W determines uniquely a logically

possible model or way of assigning truth values to the formulas in L in line with

the usual semantic rules. If the model determined by w in W satisfies α, then

we will denote this by: w � α. In the terminology of probability theory, W is

going to function as the sample space of our probability measures; accordingly,

the members of W may also be regarded as the possible outcomes of a random

experiment.

Thirdly, with that set W being in place, call 〈W, ℘(W), P〉 a probability space

(over W) if and only if (i) ℘(W) is the power set over W (the set of all subsets

of W), and (ii) P is a probability measure on ℘(W), that is, P : ℘(W) → [0, 1],

P(W) = 1, P(∅) = 0, and the axioms of finite additivity holds: for all X,Y ⊆ W,

such that X∩Y = ∅, it is the case that P(X∪Y) = P(X) + P(Y). (The axiom of so-

called countable additivity or σ-additivity will not be assumed and will not play a

role in any of the following.) Conditional probabilities can then be introduced by

means of P(Y |X) =P(X∩Y)

P(X) in case P(X) > 0. In one of the semantic systems below

we will actually deviate from this definition by allowing also for non-standard

real numbers in the unit interval to be assigned by P; but in all of the other seman-

tic systems we will stick to the definition just presented. The members of ℘(W)

21

will be called ‘propositions’—W is the largest or “tautological” proposition, ∅

is the least or “contradictory” proposition—and thus probability measures in this

sense assign probabilities to propositions and not (yet) to formulas. In standard

probability theory, the members of ℘(W) would rather be called ‘events’, but the

difference is irrelevant really. (More importantly, standard probability theory al-

lows for certain subsets of the sample space W not to be assigned probabilities at

all; this will not be important either in what follows.)

Fourthly, although each P assigns probabilities to propositions, it may be used

also, indirectly, to assign probabilities to formulas in L (and we will use the same

function symbol ‘P’ for that purpose): for α inL, let [α] = {w in W |w � α}. [α] is

the set of worlds in which α is true, and we regard it as the proposition expressed

by α. And for each α we can then define: P(α) = P([α]). Accordingly, for α, β

in L, define P(β|α) =P(α∧β)

P(α) in case P(α) > 0. In order to simplify matters a bit

later on, we will also regard P(β|α) to be well-defined, and indeed equal to 1, if

P(α) = 0. P(β|α) is the conditional probability that will be associated later with

the high probability conditional α⇒ β.

Fifth, following Ernest Adams’ lead, we define the so-called uncertainty of β

given α by means of Unc(β|α) = 1 − P(β|α). Unc(β|α) will be the uncertainty

associated with the high probability conditional α⇒ β.

Finally, let our conditional language L⇒ be the set of conditionals of the form

α ⇒ β for which the antecedent α and the consequent β are formulas in L. ⇒

is a conditional connective that is not included in the vocabulary of L, in partic-

ular, it is meant to differ from the symbol → for the material conditional. The

22

intended interpretation of the new conditional α ⇒ β will be ‘if α, then it is

highly likely that β’ or ‘the conditional probability of β given α is high’. We

want to leave open whether asserting such a conditional is meant to express the

proposition that the corresponding conditional probability is high or whether as-

serting it merely expresses the corresponding high conditional probability in a

more direct, non-propositional, “expressivist” manner (for the difference between

the two interpretations, see section 44 of Bennett 2003). Either way, we will

call these conditionals ‘high probability conditionals’, so that “their” probabilities

are given in terms of their corresponding conditional probabilities, and it is these

conditional probabilities that are taken to be high. As Lewis’ (1976) showed in

terms of his famous triviality theorems, and as subsequent work on the same topic

made even clearer (such as by Hajek 1989), this ‘probabilities of conditionals are

conditional probabilities’ claim ought not to be understood in the way that the un-

conditional probability of the proposition expressed by α ⇒ β would be required

to equal the conditional probability of β given α. This is because this would entail

the underlying probability measure to be trivial as far as its range of possible nu-

merical values is concerned, given only some very mild background assumptions.

Instead, if one wants to speak of probabilities of conditionals at all, one should

think of their probabilities as being defined as conditional probabilities without

any assumption to the effect that probabilities of conditionals would also have to

satisfy the axioms of unconditional probability. Also note that, syntactically, the

members of our conditional language L⇒ are “flat” in neither allowing for nest-

ings of conditionals nor for the application of any of the connectives of classical

23

propositional logic to conditionals. For instance, L⇒ does not include negations

of conditionals.

When we are going to study logical consequence relations for this conditional

language L⇒, we will focus on finite sets KB⇒ ⊆ L⇒ of such conditionals, which

will then function as finite sets of conditional premises or as finite (probabilistic)

conditional knowledge bases (as theoretical computer scientists would say). We

use the notation ‘KB⇒’, with the subindex ‘⇒’, in order to signal that any such

KB⇒ is a set of conditionals. Although we do not include any “factual”, that is,

non-conditional, formulas in L⇒ nor in any KB⇒ ⊆ L⇒, for many applications

one may think of conditionals > ⇒ α with the tautological antecedent > as being

logically equivalent to the factual formula α in L. In particular, this makes good

sense if one thinks of ⇒ as representing the indicative ‘if-then’ in natural lan-

guage, and, accordingly, Adams does treat α and > ⇒ α as logically equivalent.

We are now ready to present six probabilistic semantics for high probability

conditionals. Each semantics—except for the infinitesimal semantics—is based

essentially on some probability semantics that had been suggested by Adams (see,

e.g., Adams 1966, 1974, 1975, 1986, 1996, 1998, and Adams and Levine 1975).

Adams’ semantic systems were further refined and extended by Pearl (1988),

McGee (1989), Lehmann and Magidor (1992), Edgington (1995), Goldszmidt

and Pearl (1996), Schurz (1997, 1998), Snow (1999), Bamber (2000), Biazzo et

al. (2002), Halpern (2003), Arlo-Costa and Parikh (2005), and Leitgeb (2012a,

b).

Each of the semantic systems below includes the definition of a logical en-

24

tailment relation that holds between finite sets of high probability conditionals

and further such conditionals. Each of these definitions will seem to be more or

less plausible in itself, but they will all be based on different philosophical ideas

and motivations: While semantics 2, 4, 6 are defined in terms of truth preserva-

tion, semantics 1, 3, and 5 do not involve the notion of truth of a high probability

conditional in a model at all. Whereas semantics 1 and 2 understand logical con-

sequence dynamically in terms of ‘the more likely the premises get, the more

likely the conclusion gets’, all the other semantics are static. Where semantics 3

and 5 concern the reliability of reasoning with conditionals, as they demand the

probability of a conclusion not to drop too much below the probabilities of the

premises, semantics 4 and 6 take a more idealized viewpoint by considering prob-

abilistic orderings of worlds or infinitesimal probabilities. But, surprisingly, all of

these definitions can be shown ultimately to determine (extensionally) one and the

same relation of logical consequence for high probability conditionals, as we are

going to see later. The resulting sound and complete deductive system of logical

axioms and rules is Adams’ logic P of conditionals, which therefore turns out to

be robustly justified on quite diverse semantic grounds.

According to the first semantic system that we introduce, a set of high proba-

bility conditionals entails another high probability conditional if and only if: the

higher the probabilities of the conditionals contained in the premise set, the higher

also the probability of the conditional conclusion. This leads to a kind of “con-

tinuity” semantics for high probability conditionals which, accordingly, employs

an ε-δ-criterion:

25

Definition 1 (Continuity Semantics for High Probability Conditionals)

• We say that

KB⇒ �cont α⇒ β

iff for all ε > 0 there is a δ > 0, such that for all probability measures P:

if for all ϕ⇒ ψ in KB⇒ it holds that P(ψ|ϕ) > 1 − δ, then P(β|α) > 1 − ε

(that is: if P(ψ|ϕ) is “high” for all ϕ⇒ ψ in KB⇒, also P(β|α) is “high”).

It is well known that the definition of continuous functions over the reals can

either be stated in terms of an ε-δ-criterion or in terms of the preservation of

limits along sequences of real numbers. Similarly, also the continuity semantics

above allows for a restatement in terms of a sequence semantics, where a sequence

of probability measures is defined to satisfy a high probability conditional if the

conditional probability associated with the conditional is identical to 1 “in the

limit” of the sequence. Adams (1986, p.277) hints at such a type of semantics in a

footnote. Variants of such a sequence semantics—but defined on more expressive

languages than our simple L—are employed by Halpern (2003) in his system

of inductive reasoning for statistical and subjective probabilities, and by Leitgeb

(2012a, b) in his probability logic for counterfactuals:

Definition 2 (Sequence Semantics for High Probability Conditionals)

• A probabilistic sequence model Mseq for high probability conditionals is a

sequence (Pn)n∈N of probability measures.

26

• Relative to a probabilistic sequence modelMseq = (Pn)n∈N we can define:

Mseq �seq α⇒ β

iff the real sequence (Pn(β|α))n∈N converges, and

limn→∞

Pn(β|α) = 1

(that is: Pn(β|α) “tends” towards 1 for increasing n).

• Mseq �seq KB⇒ iff for every α⇒ β in KB⇒ it holds thatMseq �seq α⇒ β.

• We say that

KB⇒ �seq α⇒ β

(KB⇒ sequence-entails α⇒ β) iff

for every probabilistic sequence modelMseq:

ifMseq �seq KB⇒, thenMseq �seq α⇒ β.

Next, we turn to a semantics for high probability conditionals that does not

involve anything like probabilities getting “arbitrarily close to 1”. A set of high

probability conditionals will instead be said to entail a high probability condi-

tional if the uncertainty associated with the latter is smaller than or equal to the

sum of the uncertainties of the conditionals contained in the premise set; that is,

if the uncertainty of the conditional to be entailed is bounded additively by the

27

uncertainties that are associated with the premise conditionals. In contrast with

the two semantic systems from before, if a set of high probability conditionals en-

tails another such conditional in this sense, there is always a lower bound for the

probability that is associated with the conclusion, such that this lower bound can

additionally be computed easily. As Schurz (1997) points out, the resulting entail-

ment relation approximates the so-called “quasi-tightness” property of inferences

that was defined in Frisch and Haddawy (1994). This kind of uncertainty seman-

tics, which had been introduced by Adams again, was taken up and defended for

example by Edgington in her theories of indicative conditionals (Edgington 1995)

and vague terms (Edgington 1996); similarly, Field (2009) models his account of

how logical implication interacts normatively with degrees of belief after this kind

of (Suppes-)Adams-style uncertainty semantics:

Definition 3 (Uncertainty Semantics for High Probability Conditionals)

• We say that

KB⇒ �unc α⇒ β

(KB⇒ uncertainty-entails α⇒ β) iff

for every probability measure P (and where a sum over an empty set of

indices is defined to be 0):

P(β |α ) > 1 −∑

ϕ⇒ψ∈KB⇒

Unc(ψ |ϕ ),

28

that is,

Unc(β |α ) 6∑

ϕ⇒ψ∈KB⇒

Unc(ψ |ϕ )

(in words: P(β |α ) is “high” if the uncertainties Unc(ψ |ϕ ) are very “low”

for all ϕ ⇒ ψ ∈ KB⇒; or: for all probability measures, it holds that the

uncertainty associated with α⇒ β is bounded from above by the sum of the

uncertainties associated with the premises).

According to the next semantics, a high probability conditional is satisfied by

a certain kind of probability measure that ranks worlds by polynomial “orders of

magnitude”. A high probability conditional is satisfied by such a probability mea-

sure if its associated conditional probability is of the maximal order of magnitude

(compare Snow’s 1999 “atomic bound probabilities” and Benferhat et al. 1997

on their so-called “big-stepped probabilities”). The order-of-magnitude mapping

may also be seen as a selection function in the sense of Stalnaker (1968) or as de-

termining a special kind of sphere system of worlds in the sense of Lewis (1973).

This explains the formal correspondence between the logic of high probability

conditionals in the next subsection and Stalnaker’s and Lewis’ logical systems for

counterfactuals. But the intended interpretation of Stalnaker’s and Lewis’ order-

ings in terms of similarity or closeness to the actual world differs from the purely

probabilistic ordering of worlds below. Probabilistic order-of-magnitude models

are also close to ranked models along the lines of Kraus, Lehrmann and Magidor

(1990) and Lehmann and Magidor (1992)—which explains the formal correspon-

dence between the logic in the next subsection with systems well-known from

29

nonmonotonic reasoning—and to ranking functions (or ordinal conditional func-

tions) in the sense of Spohn (1988, 2012). This is what this order of magnitude

semantics looks like in more formal terms:

Definition 4 (Order of Magnitude Semantics for High Probability Conditionals)

• A probabilistic order-of-magnitude model Mom for high probability condi-

tionals is a bijective mapping om : W → {0, . . . , n − 1}.

(So om is both one-to-one and onto: om(w) is the “probabilistic order of

magnitude” of w. The cardinality of W, card(W), is n.)

• Relative to a probabilistic order-of-magnitude modelMom(= om), and rela-

tive to some “small” real number v ∈ [0, 1] (say, v < 12 ), we can define:

– Let Pom be the unique probability measure that satisfies:

Pom({w}) = vom(w)(1 − v) for om(w) < card(W) − 2,

Pom({w}) = vcard(W)−1 for om(w) = card(W) − 2,

Pom({w}) = 0 for om(w) = card(W) − 1.

Mom �om α⇒ β

iff Pom(β |α ) > 1 − v

(that is: Pom(β |α ) is “high” or corresponds to the highest order v0(1−

v) = 1 − v of magnitude).

30

Note that whether Mom �om α ⇒ β or not is actually independent of the

exact choice of v.

• Mom �om KB⇒ iff for every α⇒ β ∈ KB⇒ it holds thatMom �om α⇒ β.

• We say that

KB⇒ �om α⇒ β

(KB⇒ order–of-magnitude-entails α⇒ β) iff

for every probabilistic order-of-magnitude modelMom:

ifMom �om KB⇒, thenMom �om α⇒ β.

The next semantics defines a set of high probability conditionals to entail a

high probability conditional, if, whenever the conditional probabilities that are

associated with the premises are “close” to 1 (where the referent of ‘close’ is

determined relative to the number of premises), the conditional probability that is

associated with the conclusion, say, α ⇒ β, is greater than 12 and hence greater

than the conditional probability that is associated with α ⇒ ¬β. Since in any

such case the set of β-worlds constitutes the “majority” within the set of α-worlds

(as measured by the probability measure in question), we call this the ‘majority

semantics’. Logical consequence given by this semantics therefore consists in the

premises making the conclusion more likely than not:

Definition 5 (Majority Semantics for High Probability Conditionals)

31

• Let KB⇒ = {ϕ1 ⇒ ψ1, . . . , ϕn ⇒ ψn}:

We say that

KB⇒ �ma j α⇒ β

(KB⇒ majority-entails α⇒ β) iff

for all probability measures P:

if P(ψ1|ϕ1) > 1 − 12n , . . . , P(ψn|ϕn) > 1 − 1

2n , then P(β|α) > 12

(that is, if the premise probabilities are “high”, then P(β|α) is greater than

P(¬β|α)).

The final semantics for high probability conditionals that we will discuss was

suggested by Lehmann and Magidor (1992), pp.48–53, and it is special in so far

as it presupposes the nonstandard analysis of real numbers. Nonstandard analysis

adds infinitely small numbers (the so-called ‘infinitesimals’) and infinitely large

numbers to the standard set of real numbers. Apart from the introduction to non-

standard analysis that is contained in Lehmann and Magidor (1992) itself, brief but

useful accounts of nonstandard analysis can also be found in section 4.4 of Chang

and Keisler (1990), and in more informal terms, in Adams (1998), pp.253–256.

Definition 6 (Infinitesimal Semantics for High Probability Conditionals)

• An infinitesimal probabilistic model Min f for high probability conditionals

is a nonstandard probability measure P : ℘(W)→ [0, 1]∗, that is, probabili-

ties are nonstandard reals non-strictly between 0 and 1, such that P(W) = 1,

P(∅) = 0, and finite additivity is satisfied.

32

• Relative to an infinitesimal probabilistic modelMin f (= P) we can define:

Min f �in f α⇒ β

iff 1 − P(β |α ) is infinitesimal, that is,

for all standard reals ε ∈ R with ε > 0: 1 − P(β |α ) < ε

(that is: P(β |α ) is either identical to 1 or “infinitely close” to 1).

• Min f �in f KB⇒ iff for every α⇒ β ∈ KB⇒ it holds thatMin f �in f α⇒ β.

• We say that

KB⇒ �in f α⇒ β

(KB⇒ infinitesimally entails α⇒ β) iff

for every infinitesimal probabilistic modelMin f :

ifMin f �in f KB⇒, thenMin f �in f α⇒ β.

This concludes our series of semantical systems for high probability condi-

tionals.

We are now ready to turn to a comparison between these different versions of a

high probability semantics in terms of their respective logical consequences rela-

tions. Surprisingly, the semantic systems that we presented in this subsection turn

out to be extensionally mutually equivalent in the following sense (for the proof

of this theorem, and for information on which proofs in the relevant literature the

theorem is based, see Leitgeb 2004, pp.177f):

33

Theorem 7 (Equivalence of the Different Versions of Probability Semantics with

respect to Entailment)

Let KB⇒ ⊆ L⇒, α⇒ β ∈ L⇒; the following claims are equivalent:

1. KB⇒ �cont α⇒ β

2. KB⇒ �seq α⇒ β

3. KB⇒ �unc α⇒ β

4. KB⇒ �om α⇒ β

5. KB⇒ �ma j α⇒ β

6. KB⇒ �in f α⇒ β.

In the next subsection, we will determine the very consequence relation that

corresponds to these semantic systems in proof-theoretic terms.

3.2 Proof Theory for High Probability Conditionals

Consider the following rules of inference for conditionals in L⇒ (where ‘`’ de-

notes the derivability relation of classical propositional logic):

α⇒ α(Reflexivity)

•α ` β, β ` α, α⇒ γ

β⇒ γ(Left Equivalence)

34

•γ ⇒ α, α ` β

γ ⇒ β(Right Weakening)

•(α ∧ β)⇒ γ, α⇒ β

α⇒ γ(Cautious Cut)

•α⇒ β, α⇒ γ

(α ∧ β)⇒ γ(Cautious Monotonicity)

Note that Reflexivity is premise-free (so it is really an axiom scheme).

Kraus, Lehmann and Magidor (1990), section 3, refer to the system of rules

above as the system C of cumulative reasoning (although they spell things out

in terms of nonmonotonic consequence relations rather than in terms of condi-

tionals). Cumulativity, that is, Cautious Cut and Cautious Monotonicity taken

together, has been suggested by Gabbay (1984) to be a valid closure property of

plausible reasoning: Cautious Monotonicity expresses that importing consequents

(such as β) into an antecedent (so that α is turned into α∧β) does not subtract from

the original antecedent’s (α’s) inferential power. In turn, Cautious Cut expresses

that importing consequents in this way does not add to the antecedent’s inferential

power either: for consider the denial of the conclusion. Then at least one of the

two premises does not hold. If α ⇒ β does hold, so that β is a consequence of α,

then we cannot infer γ by importation of that consequence.

Furthermore, we also consider the following rule:

•α⇒ γ, β⇒ γ

α ∨ β⇒ γ(Disjunction)

35

The system that results from adding the Disjunction rule to system C is called

the system P of preferential reasoning by Kraus, Lehmann and Magidor (1990),

section 5. This stronger system P is one of the standard systems of nonmonotonic

logic, and it turns out to be sound and complete with respect to many different se-

mantics of nonmonotonic logic (some of them are collected in Gabbay et al. 1994;

see also Gardenfors and Makinson 1994, Chapter 4.3 of Fuhrmann 1997, Benfer-

hat et al. 1997, and Benferhat et al. 2000). Psychological findings, though still on

a very preliminary level, indicate that P incorporates some of the rationality pos-

tulates governing human commonsense reasoning with conditionals (see Pfeifer

and Kleiter 2005, 2010). P also coincides with the “flat” fragment of Stalnaker’s

and Lewis’ logic(s) for counterfactuals.

The derivability of conditionals α ⇒ β from a finite set KB⇒ of conditionals

by means of the rules above—resulting in the deductive consequence relations `C

and `P, respectively—is defined just as usual, that is, analogously to the definition

of derivability of formulas from formulas in classical propositional logic.

The following rules can be shown to be (meta-)derivable from the systems

introduced above:

Lemma 8 (Kraus, Lehmann and Magidor 1990, pp.179–180)

The following rules are derivable in C (that is: from Reflexivity+Left Equiva-

lence+Right Weakening+Cautious Cut+Cautious Monotonicity):

1.α⇒ β, α⇒ γ

α⇒ (β ∧ γ)(And)

36

2.α⇒ β, β⇒ α, α⇒ γ

β⇒ γ(Equivalence)

3.α⇒ (β→ γ) , α⇒ β

α⇒ γ(Modus Ponens in the Consequent)

4.α ` β

α⇒ β(Supra-Classicality)

Lemma 9 (Kraus, Lehmann and Magidor 1990, p.191)

The following rules are derivable in P (that is: from Reflexivity+Left Equiva-

lence+Right Weakening+Cautious Cut+Cautious Monotonicity+Disjunction; we

label the derivable rules in the same way as Kraus, Lehmann and Magidor 1990):

1.α ∧ β⇒ γ

α⇒ (β→ γ)(S)

2.α ∧ β⇒ γ, α ∧ ¬β⇒ γ

α⇒ γ(D)

Finally, we can relate the semantic systems of the previous subsection to the

system of rules specified above by means of a soundness and completeness theo-

rem (see Leitgeb 2004, chapter 10, for the proof, and for the proofs in the relevant

parts of the literature on which the theorem is based):

Theorem 10 (Soundness and Completeness of P)

37

Let KB⇒ ⊆ L⇒, α ⇒ β ∈ L⇒; then each of the claims in theorem 7 is

equivalent to:

KB⇒ `P α⇒ β

That is: the system P is sound and complete with respect to the continuity seman-

tics, the sequence semantics, the uncertainty semantics, the order of magnitude

semantics, the majority semantics, and the infinitesimal semantics for high prob-

ability conditionals.

In contrast, none of the following rules are (meta-)derivable in P nor are they

valid with respect to any of the semantics of the last subsection, even though their

counterparts for material conditionals are of course valid:

•α⇒ β

¬β⇒ ¬α(Contraposition)

•α⇒ β, β⇒ γ

α⇒ γ(Transitivity)

•α⇒ γ

α ∧ β⇒ γ(Monotonicity; Strengthening of the Antecedent)

As Bennett (2003) argues in his chapter 9 (and as had been argued before by,

e.g., Adams 1975 and Edgington 1995), none of these rules of inference is par-

ticularly plausible for the indicative if-then in natural language. Accordingly, in

nonmonotonic reasoning all of these rules are normally given up as applying to

default conditionals or (if reformulated accordingly) nonmonotonic consequence

38

relations. However, we have already seen weakenings of these three rules to be

contained in system P: in particular, Cautious Cut may be regarded as a weak-

ening of Transitivity, and Cautious Monotonicity is clearly a weakened version

of Monotonicity. (See Johnson and Parikh 2008 for an argument that, in a sense

explained in their paper, the monotonicity rule is nevertheless “almost valid” for

probabilistic conditionals.)

The exchange of ideas between logic and probability theory, and the system-

atic study of jointly logical and probabilistic systems, has had a favourable effect

on both areas in the past. It may have an even more favourable effect on the two

areas in the future.

Acknowledgements: We are grateful to Stanislav Speranski, Alan Hajek, John

Cusbert, and Edward Elliott for comments on a previous draft of this chapter.

Work on this paper was supported generously by the Alexander von Humboldt

Foundation.

References

[1] Abadi, M. and Halpern, J.Y., 1994: “Decidability and Expressiveness for

First-Order Logics of Probability”, Information and Computation 112, 1–

36.

[2] Adams, E.W., 1966: “Probability and the Logic of Conditionals”, in: Hin-

tikka and Suppes (1966), 265–316.

39

[3] Adams, E.W., 1974: “The Logic of ‘Almost All”’, Journal of Philosophical

Logic 3, 3–17.

[4] Adams, E.W., 1975: The Logic of Conditionals, Dordrecht: D. Reidel.

[5] Adams, E.W., 1986: “On the Logic of High Probability”, Journal of Philo-

sophical Logic 15, 255–279.

[6] Adams, E.W., 1996: “Four Probability-Preserving Properties of Infer-

ences”, Journal of Philosophical Logic 25, 1–24.

[7] Adams, E.W., 1998: A Primer of Probability Logic, Stanford: CSLI Lec-

ture Notes.

[8] Adams, E.W. and Levine, H.P., 1975: “On the Uncertainties Transmit-

ted from Premisses to Conclusions in Deductive Inferences”, Synthese 30,

429–460.

[9] Arlo-Costa, H., 2005: “Non-Adjunctive Inference and Classical Modali-

ties”, Journal of Philosophical Logic 34, 581–605.

[10] Arlo-Costa, H. and Parikh, R., 2005: “Conditional Probability and Defea-

sible Inference”, Journal of Philosophical Logic 34, 97–119.

[11] Bacchus, F., 1990a: “On Probability Distributions Over Possible Worlds”,

in: Proceedings of the Fourth Annual Conference on Uncertainty in Artifi-

cial Intelligence, UAI’1988, Amsterdam: North-Holland, 217–226.

40

[12] Bacchus, F., 1990b: Representing and Reasoning with Probabilistic Knowl-

edge, Cambridge: The MIT Press.

[13] Bacchus, F., Grove, A.J., Halpern, J.Y., and Koller, D., 1996: “From Sta-

tistical Knowledge Bases to Degrees of Belief”, Artificial Intelligence 87,

75–143.

[14] Baltag, A. and Smets, S., 2008: “Probabilistic Dynamic Belief Revision”,

Synthese 165, 179–202.

[15] Bamber, D., 2000: “Entailment with Near Surety of Scaled Assertions of

High Conditional Probability”, Journal of Philosophical Logic 29, 1–74.

[16] Benferhat, S., Dubois, D., and Prade, H., 1997: “Possibilistic and Standard

Probabilistic Semantics of Conditional Knowledge”, Journal of Logic and

Computation 9, 873–895.

[17] Benferhat, S., Saffiotti, A., and Smets, P., 2000: “Belief Functions and

Default Reasoning,” Artificial Intelligence 122, 1–69.

[18] Bennett, J., 2003: A Philosophical Guide to Conditionals, Oxford: Claren-

don Press.

[19] Van Benthem, J., Gerbrandy, J., and Kooi, B., 2009: “Dynamic Update

with Probabilities”, Studia Logica 93, 67–96.

[20] Biazzo, V., Gilio, A., Lukasiewicz, T., and Sanfilippo, G., 2002: “Proba-

bilistic Logic under Coherence, Model-Theoretic Probabilistic Logic, and

41

Default Reasoning in System P”, Journal of Applied Non-Classical Logics

12, 189–213.

[21] Brewka, G. (ed.), 1996: Principles of Knowledge Representation, Stanford:

CSLI Publications and FoLLI.

[22] Brewka, G., Dix, J., Konolige, K., 1997: Nonmonotonic Reasoning. An

Overview, Stanford: CSLI Lecture Notes 73.

[23] Boole, G., 1854: An Investigation of The Laws of Thought on Which are

Founded the Mathematical Theories of Logic and Probabilities, London:

Macmillan.

[24] Buchak, L., forthcoming: “Belief, Credence, and Norms”, Philosophical

Studies.

[25] Burgess, J.P., 1969: “Probability Logic”, The Journal of Symbolic Logic

34, 264–274.

[26] Caie, M., 2013: “Rational Probabilistic Incoherence”, Philosophical Re-

view 122, 527–575.

[27] Carnap, R., 1950: Logical Foundations of Probability, Chicago: University

of Chicago Press.

[28] Chang, C.C., Keisler, H.J., 1990: Model Theory, Amsterdam: North-

Holland.

[29] Christensen, D., 2004: Putting Logic in Its Place, Oxford: Clarendon Press.

42

[30] Christiano, P., Yudkowsky, E., Herreshoff, M., and Barasz, M., unpub-

lished: “Definability of Truth in Probabilistic Logic”, unpublished draft.

[31] Cross, C.B., 1993: “From Worlds to Probabilities: A Probabilistic Seman-

tics for Modal Logic”, Journal of Philosophical Logic 22, 169–192.

[32] Dubois, D. and Prade, H., 1996: “Non-Standard Theories of Uncertainty in

Plausible Reasoning”, in: G. Brewka (1996), 1–32.

[33] Edgington, D., 1995: “On Conditionals”, Mind 104, 235–329.

[34] Edgington, D., 1996: “Vagueness by Degrees”, in: R. Keefe and P. Smith

(eds.), Vagueness: A Reader, Cambridge: MIT Press, 617–630.

[35] Fagin, R., 1976: “Probabilities on Finite Models”, Journal of Symbolic

Logic 41, 50–58.

[36] Fagin, R. and Halpern, J.Y., 1994: “Reasoning about Knowledge and Prob-

ability”, Journal of the ACM 41, 340–367.

[37] Fagin, R., Halpern, J.Y., and Megiddo, N., 1990: “A Logic for Reasoning

About Probabilities”, Information and Computation 87, 78–128.

[38] Fenstad, J.E., 1967: “Representations of Probabilities Defined on First Or-

der Languages”, in: J.N. Crossley (ed.), Sets, Models and Recursion The-

ory, Amsterdam: North-Holland, 156–172.

[39] Field, H., 1977: “Logic, Meaning, and Conceptual Role”, The Journal of

Philosophy 74, 379–409.

43

[40] Field, H., 2009: “What is the Normative Role of Logic?”, Proceedings of

the Aristotelian Society Supplementary Volume LXXXIII, 251–268.

[41] Foley, R., 1993: Working Without a Net, Oxford: Oxford University Press.

[42] Van Fraassen, B., 1981: “Probabilistic Semantics Objectified: I. Postulates

and Logics”, Journal of Philosophical Logic 10, 371–394.

[43] Van Fraassen, B., 1995: “Belief and the Problem of Ulysses and the

Sirens”, Philosophical Studies 77, 7–37.

[44] Frisch, A.M. and Haddawy, P., 1988: “Probability as a Modal Operator”,

in: Proceedings of the 4th Workshop on Uncertainty in AI, Minneapolis,

MN, 109–118.

[45] Frisch, A.M. and Haddawy, P., 1994: “Anytime Deduction for Probabilistic

Logic”, Artificial Intelligence 69, 93–122.

[46] Fuhrmann, A., 1997: An Essay on Contraction, Stanford: CSLI Publica-

tions.

[47] Gabbay, D.M., 1984: “Theoretical Foundations for Non-Monotonic Rea-

soning in Expert Systems”, in: K.R. Apt (ed.), Logics and Models of Con-

current Systems, Berlin: Springer, 439–458.

[48] Gabbay, D.M., Hogger, C.J., and Robinson, J.A. (eds.), 1994: Handbook of

Logic in Artificial Intelligence and Logic Programming 3, Oxford: Claren-

don Press, 35–110.

44

[49] Gaifman, H., 1964: “Concerning Measures in First Order Calculi”, Israel

Journal of Mathematics 2, 1–18.

[50] Gaifman, H. and Snir, M., 1982: “Probabilities Over Rich Languages, Test-

ing and Randomness”, The Journal of Symbolic Logic 47, 495–548.

[51] Gaifman, H., 1986: “A Theory of Higher Order Probabilities”, in: Proceed-

ings of the Conference on Theoretical Aspects of Reasoning about Knowl-

edge, Monterey: California, 275–292.

[52] Gardenfors, P., 1975: “Qualitative Probability as an Intensional Logic”,

Journal of Philosophical Logic 4, 171–185.

[53] Gardenfors, P. and Makinson, D., 1994: “Nonmonotonic Inference Based

on Expectations,” Artificial Intelligence 65, 197–245.

[54] Goldszmidt, M. and Pearl, J., 1996: “Qualitative Probabilities for Default

Reasoning, Belief Revision, and Causal Modeling”, Artificial Intelligence

84, 57–112.

[55] Haenni, R., 2005: “Unifying Logical and Probabilistic Reasoning”, in:

L. Godo (ed.), Symbolic and Quantitative Approaches to Reasoning with

Uncertainty, Lecture Notes in Artificial Intelligence Vol. 3571, Berlin:

Springer, 788–799.

[56] Hajek, A., 1989: “Probabilities of Conditionals–Revisited”, Journal of

Philosophical Logic 18, 423–428.

45

[57] Hailperin, T., 1937: “Foundations of Probability in Mathematical Logic”,

Philosophy of Science 4, 125–150.

[58] Hailperin, T., 1984: “Probability Logic”, Notre Dame Journal of Formal

Logic 25, 198–212.

[59] Hailperin, T., 1996: Sentential Probability Logic, Bethlehem, PA: Lehigh

University Press.

[60] Hailperin, T., 2000: “Probability Semantics for Quantifier Logic”, Journal

of Philosophical Logic 29, 207–239.

[61] Halpern, J.Y., 1990: “An Analysis of First-Order Logics of Probability”,

Artificial Intelligence 46, 311–350.

[62] Halpern, J.Y., 1991: “The Relationship between Knowledge, Belief, and

Certainty”, Annals of Mathematics and Artificial Intelligence 4, 301–322.

[63] Halpern, J.Y., 2001: “Lexicographic Probability, Conditional Probability,

and Nonstandard Probability”, in: Proceedings of the Eighth Conference

on Theoretical Aspects of Rationality and Knowledge. Ithaca, NY: Morgan

Kaufmann, 17–30.

[64] Halpern, J.Y., 2003: Reasoning About Uncertainty, Cambridge, Mass.: The

MIT Press.

[65] Halpern, J.Y., Rabin, M.O., 1987: “A Logic to Reason about Likelihood”,

Artificial Intelligence 32, 379–405.

46

[66] Hamblin, C.L., 1959: “The Modal ’Probably”’, Mind 68, 234–240.

[67] Hawthorne, J., 1996: “On the Logic of Nonmonotonic Conditionals and

Conditional Probabilities”, Journal of Philosophical Logic 25, 185–218.

[68] Hawthorne, J., 2007: “Nonmonotonic Conditionals that Behave Like Con-

ditional Probabilities Above a Threshold”, Journal of Applied Logic 5,

625–637.

[69] Hawthorne, J. and Makinson, D., 2007: “The Quantitative/Qualitative Wa-

tershed for Rules of Uncertain Inference”, Studia Logica 86, 247–297.

[70] Heifetz, A. and Mongin, P., 2001: “Probability Logic for Type-Spaces”,

Games and Economic Behavior 35, 31–53.

[71] Hempel, C.G., 1962: “Deductive-Nomological vs Statistical Explanation”,

in: H. Feigl and G. Maxwell (eds.), Minnesota Studies in the Philosophy of

Science III, Minneapolis: University of Minnesota Press, 98–169.

[72] Hilpinen, R., 1968: Rules of Acceptance and Inductive Logic, Acta Philo-

sophical Fennica 22, Amsterdam: North-Holland.

[73] Hintikka, J. and Suppes P. (eds.), 1966: Aspects of Inductive Logic, Ams-

terdan: North-Holland.

[74] Hoover, D. N., 1978: “Probability Logic”, Annals of Mathematical Logic

14, 287–313.

47

[75] Howson, C., 2003: “Probability and Logic”, Journal of Applied Logic 1,

151–165.

[76] Huber, F. and Schmidt-Petri, C. (eds.), 2009: Degrees of Belief, Springer,

Synthese Library 342.

[77] Johnson, M. and Parikh, R., 2008: “Probabilistic Conditionals are Almost

Monotonic”, Review of Symbolic Logic 1, 73–80.

[78] Keisler, H. J., 1985: “Probability Quantifiers”, in: J. Barwise and S. Fefer-

man (eds.), Model-Theoretic Logics, New York: Springer, 509–556.

[79] Kooi, B.P., 2003: “Probabilistic Dynamic Epistemic Logic”, Journal of

Logic, Language and Information 12, 381–408.

[80] Kraus, S., Lehmann, D., and Magidor, M., 1990: “Nonmonotonic Reason-

ing, Preferential Models and Cumulative Logics”, Artificial Intelligence 44,

167–207.

[81] Kyburg, H.Jr., 1961: Probability and the Logic of Rational Belief, Middle-

town: Wesleyan University Press.

[82] Lando, T., 2010: “Completeness of S4 for the Lebesgue measure algebra”,

Journal of Philosophical Logic 41, 287–316.

[83] Leblanc, H., 1979: “Probabilistic Semantics for First-Order Logic”,

Zeitschrift fur mathematische Logik und Grundlagen der Mathematik 25,

497–509.

48

[84] Leblanc, H., 1983: “Alternatives to Standard First-Order Semantics”, in:

D. Gabbay and F. Guenthner (eds.), Handbook of Philosophical Logic, Vol-

ume I, Dordrecht: Reidel, 189–274.

[85] Lehmann, D. and Magidor, M., 1992: “What Does a Conditional Knowl-

edge Base Entail?”, Artificial Intelligence 55, 1–60.

[86] Leitgeb, H., 2004: Inference on the Low Level. An Investigation into De-

duction, Nonmonotonic Reasoning, and the Philosophy of Cognition, Dor-

drecht: Kluwer, Applied Logic Series.

[87] Leitgeb, H., 2012a: “A Probabilistic Semantics for Counterfactuals. Part

A”, Review of Symbolic Logic 5, 16–84.

[88] Leitgeb, H., 2012b: “A Probabilistic Semantics for Counterfactuals. Part

B”, Review of Symbolic Logic 5, 85–121.

[89] Leitgeb, H., 2012c: “From Type-Free Truth to Type-Free Probability”, in:

G. Restall and G. Russell (eds.), New Waves in Philosophical Logic, New

York: Palgrave Macmillan, 84–93.

[90] Leitgeb, H., 2014: “The Stability Theory of Belief”, The Philosophical

Review 123, 131–171.

[91] Levi, I., 1967: Gambling with the Truth. An Essay on Induction and the

Aims of Science, Cambridge, Mass.: The MIT Press.

[92] Lewis, D., 1973: Counterfactuals, Oxford: Basil Blackwell.

49

[93] Lewis, D.K., 1976: “Probabilities of Conditionals and Conditional Proba-

bilities”, The Philosophical Review 85: 297–315. Reprinted in D.K. Lewis,

Philosophical Papers, Vol. II. Oxford: Oxford University Press, 1986, 133–

156.

[94] Lewis, D.K., 1980: “A Subjectivists Guide to Objective Chance”, in: R.

Jeffrey (ed.), Studies in Inductive Logic and Probability, Vol II., Berkeley:

University of California Press, 263–293. Reprinted in D.K. Lewis, Philo-

sophical Papers, Vol. II, Oxford: Oxford University Press, 1986, 83–132.

[95] Lin, H. and Kelly, K.T., 2012: “Propositional Reasoning that Tracks Prob-

abilistic Reasoning”, Journal of Philosophical Logic 41, 957–981.

[96] Maher, P., 1993: Betting on Theories, Cambridge: Cambridge University

Press.

[97] Makinson, D., 1989: “General Theory of Cumulative Inference”, in: M.

Reinfrank et al. (eds.), Non-Monotonic Reasoning, Lecture Notes on Arti-

ficial Intelligence, vol. 346, Berlin: Springer, 1–18.

[98] Makinson, D., 1994: “General Patterns in Nonmonotonic Reasoning”, in:

Gabbay et al. (1994), 35–110.

[99] Makinson, D., 2011: “Conditional Probability in the Light of Qualitative

Belief Change”, Journal of Philosophical Logic 40, 121–153.

[100] Makinson, D., 2012: “Logical Questions behind the Lottery and Preface

Paradoxes: Lossy Rules for Uncertain Inference”, Synthese 186, 511–529.

50

[101] McGee, V., 1989: “Conditional Probabilities and Compounds of Condi-

tionals”, The Philosophical Review 98, 485–541.

[102] Morgan, C., 1982: “Simple Probabilistic Semantics for Modal Logic”,

Journal of Philosophical Logic 11, 443–458.

[103] Nilsson, N., 1986: “Probabilistic Logic”, Artificial Intelligence 28, 71–87.

[104] Paris, J. and Simmonds, R., 2009: “O Is Not Enough”, Review of Symbolic

Logic 2, 298–309.

[105] Paris, J., 2011: “Pure Inductive Logic”, in: L. Horsten and R. Pettigrew

(eds.), The Continuum Companion to Philosophical Logic, London: Con-

tinuum, 428–449.

[106] Pearl, J., 1988: Probabilistic Reasoning in Intelligent Systems, San Mateo:

Morgan Kaufmann.

[107] Pearl, J. and Goldszmidt, M., 1996: “Probabilistic Foundations of Qualita-

tive Reasoning with Conditional Sentences”, in: G. Brewka (1996), 33–68.

[108] Pfeifer, N. and Kleiter, G.D., 2005: “Coherence and Nonmonotonicity in

Human Reasoning”, Synthese 146, 93–109.

[109] Pfeifer, N. and Kleiter G.D., 2010: “The Conditional in Mental Probability

Logic”, in: M. Oaksford and N. Chater (eds.), Cognition and Condition-

als: Probability and Logic in Human Thought, Oxford: Oxford University

Press, 153–173.

51

[110] Popper, K.R., 1955: “Two Autonomous Axiom Systems for the Calculus

of Probabilities”, British Journal for the Philosophy of Science 6, 51–57.

[111] Ramsey, F.P., 1926: “Truth and Probability”, in: F.P. Ramsey, The Founda-

tions of Mathematics and other Logical Essays, edited by R.B. Braithwaite,

London: Kegan Paul, 1931, 156–198.

[112] Rescher, N., 1962: “A Probabilistic Approach to Modal Logic”, Acta Philo-

sophica Fennica 16, 215–226.

[113] Richardson, M. and Domingos, P., 2006: “Markov Logic Networks”, Ma-

chine Learning 62, 107–136.

[114] Roeper, P. and Leblanc, H., 1999: Probability Theory and Probability Se-

mantics, Toronto: University of Toronto Press.

[115] Ross, J. and Schroeder, M., forthcoming: “Belief, Credence, and Pragmatic

Encroachment”, Philosophy and Phenomenological Research.

[116] Schurz, G., 1997: “Probabilistic Default Logic Based on Irrelevance and

Relevance Assumptions”, in: D.M. Gabbay et al. (eds.), Qualitative and

Quantitative Practical Reasoning, Berlin: Springer, 536–553.

[117] Schurz, G., 1998: “Probabilistic Semantics for Delgrande’s Conditional

Logic and a Counterexample to his Default Logic”, Artificial Intelligence

102, 81–95.

52

[118] Schurz, G., 2001: “What is ‘Normal’? An Evolution-Theoretic Foundation

of Normic Laws and Their Relation to Statistical Normality”, Philosophy

of Science 68, 476–497.

[119] Scott, D. and Krauss, P., 1966: “Assigning Probabilities to Logical Formu-

las”, in: Hintikka and Suppes (1966), 219–264.

[120] Segerberg, K., 1971: “Qualitative Probability in a Modal Setting”, in: J.E.

Fenstad (ed.), Proc. of the 2nd Scand. Log. Symp., Amsterdam: North-

Holland, 341–352.

[121] Snow, P., 1999: “Diverse Confidence Levels in a Probabilistic Semantics

for Conditional Logics”, Artificial Intelligence 113, 269–279.

[122] Speranski, S.O., 2013: “Complexity for Probability Logic with Quantifiers

over Propositions”, Journal of Logic and Computation 23, 1035–1055.

[123] Spohn, W., 1988: “Ordinal Conditional Functions: A Dynamic Theory of

Epistemic States”, in: W.L. Harper, B. Skyrms (eds.), Causation in Deci-

sion, Belief Change, and Statistics, 2, Dordrecht: Reidel, 105–134.

[124] Spohn, W., 2012: The Laws of Belief: Ranking Theory and Its Philosophi-

cal Applications, Oxford: Oxford University Press.

[125] Stalnaker, R.C., 1968: “A Theory of Conditionals”, in: N. Rescher (ed.),

Studies in Logical Theory, Blackwell, 98–112.

53

[126] Stalnaker, R.C., 1970: “Probability and Conditionals”, Philosophy of Sci-

ence 37: 64–80.

[127] Sturgeon, S. 2008: “Reason and the Grain of Belief”, Nous 42, 139–165.

[128] Suppes, P., 1966: “Probabilistic Inference and the Concept of Total Evi-

dence”, in: Hintikka and Suppes (1966), 49–65.

[129] Swain, M. (ed.), 1970: Induction, Acceptance and Rational Belief, Dor-

drecht: Reidel, 55–82.

[130] Terwijn, S.A., 2005: “Probabilistic Logic and Induction”, Journal of Logic

and Computation 15, 507–515.

[131] Wedgwood, R., 2012: “Outright Belief”, Dialectica 66(3), 309–329.

[132] Wheeler, G., 2007: “A Review of the Lottery Paradox”, in: W.L. Harper

and G. Wheeler (eds.), Probability and Inference: Essays in Honor of

Henry E. Kyburg Jr., London: Kings College Publications, 1–31.

[133] Yalcin, S., 2010: “Probability Operators”, Philosophy Compass 5, 916–

937.

54