on proper (popper?) and improper uses of information in epistemology

8
On proper (popper?) and improper uses of information in epistemology JAAKKO HINTIKKA Boston University “INFORMATION” IS A TERM John Austin would have liked.’ It is an ugly word, it is a foreign word, and perhaps it does not mean very much. But it has one good thing about it: it is not a deep word. At least it is a less deep word than “knowledge”. If I claim that I know something, I am taking quite a plunge. I am liable to being chal- lenged: “Do you know that you know?” In contrast, if I inform you about something, no one will make heavy weather about it. You can take my message or leave it. Even if it turns out that I was lying to you and there are no springs in Casablanca, you can resignedly say, “I was misinformed”. “Information” is even a less deep word than “belief’. Everybody expects to receive sometimes contradictory items of information, and everybody is expected to learn to cope with such a predica- ment. No one needs elaborate theories to reject or to accept infor- mation. Such steps are all in a day’s epistemological work. In this essay, I will put forward a number of theses about the notion of information. They are distillations of work I have done over a long period of time, mostly in a distant past. Looking back, and looking at what other philosophers have said of my work, I have come to realize that I have often left out the philosophical punch lines from what I have written. I have developed an intri- cate line of thought and I have followed up its consequences, but I have not spelled out its main philosophical impact forcefully enough. I am trying to supply some of the missing theses here. In doing so, I am not leaving myself much opportunity of arguing them. I can only hope that the matter-of-factness of the notion of Cf. Austin, J., “PerformativeUtterances”, Philosophical Papers, Oxford: Claren- don Press, 1961.

Upload: jaakko-hintikka

Post on 29-Sep-2016

216 views

Category:

Documents


6 download

TRANSCRIPT

On proper (popper?) and improper uses of information in epistemology

JAAKKO HINTIKKA Boston University

“INFORMATION” IS A TERM John Austin would have liked.’ It is an ugly word, it is a foreign word, and perhaps it does not mean very much. But it has one good thing about it: it is not a deep word. At least it is a less deep word than “knowledge”. If I claim that I know something, I am taking quite a plunge. I am liable to being chal- lenged: “Do you know that you know?” In contrast, if I inform you about something, no one will make heavy weather about it. You can take my message or leave it. Even if it turns out that I was lying to you and there are no springs in Casablanca, you can resignedly say, “I was misinformed”.

“Information” is even a less deep word than “belief’. Everybody expects to receive sometimes contradictory items of information, and everybody is expected to learn to cope with such a predica- ment. No one needs elaborate theories to reject or to accept infor- mation. Such steps are all in a day’s epistemological work.

In this essay, I will put forward a number of theses about the notion of information. They are distillations of work I have done over a long period of time, mostly in a distant past. Looking back, and looking at what other philosophers have said of my work, I have come to realize that I have often left out the philosophical punch lines from what I have written. I have developed an intri- cate line of thought and I have followed up its consequences, but I have not spelled out its main philosophical impact forcefully enough. I am trying to supply some of the missing theses here. In doing so, I am not leaving myself much opportunity of arguing them. I can only hope that the matter-of-factness of the notion of

’ Cf. Austin, J., “Performative Utterances”, Philosophical Papers, Oxford: Claren- don Press, 1961.

ON PROPER (POPPER?) AND IMPROPER USES OF INFORMATION 159

information, together with my earlier work, lend enough plausibil- ity to these theses to make them interesting.

(1) Information is specified by specifying which alternatives concerning the reality it admits and which alternatives it excludes.*

This is the explanation of the pragmatic role of information. If I receive an item of information, and if this information is true, I can leave out of my preparations those alternatives which it excludes and concentrate on those that it admits.

This analysis of information is the basis of what is known as epistemic and doxastic logics, and is almost the only thing needed in those field^.^ Belief can be treated as accepted information, and knowledge as information whose acceptance is justified. However, not very much in doxastic logic depends on the nature of that ac- ceptance, and very little depends in epistemic logic on the nature of the justification of that acceptance.

It follows that it is bad heuristics to try to explicate information in terms of belief change. The logical structure of information is one of the most basic and one of the simplest things in the wide and wonderful world of logical analy~is .~ In contrast, the tactics and strategies of information acceptance and rejection-for that’s

’ There exists a rich flora of other uses of the term “information”. None of the other senses have as close relationship to the pragmatic role of information, how- ever. For a taxonomy of some of the other senses, see Nauta, D., The Meaning of Information, Hague: Mouton, 1992.

Cf. here my forthcoming article on epistemic logic in the Routledge Encyclopedia of Philosophy.

This point can be put into a deeper perspective. A distinction can be made-and ought to be made-between two kinds of rules (or principles) in any strategic ac- tivity like knowledge-seeking. On the one hand you have the rules that define the game, e.g. how chessmen are moved on a board. They can be called definitory rules. They must be distinguished from rules, including rules of thumb, that deal with what is better and what is worse in the “game” in question. Definitory rules do not say anything about this subject. Rules which do can be called strategic rules. Now rules for belief change are clearly strategic rather than definitory rules in the game of information-seeking. As such, they are more complex than any “semantic” defi- nition of information in terms of excluded alternatives. For we know from game theory that utilities are in principle associated with entire strategies, not with indi- vidual moves.

160 JAAKKO HINTIKKA

what is involved in the so-called belief change-are as subtle mat- ters as one can possibly find in epistemology.

(2) The alternatives that are admitted or rejected in information do not normally concern the state or the history of the en- tire universe but only of some small part of it.

Here the reader’s first reaction undoubtedly is: Who ever thought otherwise? The answer is: Lots of philosophers have done so. For instance, Ian Hacking has claimed that Carnap’s inductive logic and by implication Carnap’s use of the notion of information is predi- cated on thinking in terms of the state of the entire world.s The same vision is reflected in philosophers’ megalomaniac terminology of “possible worlds”. For a long time I interpreted such terminology as metaphoric, perhaps mediated by Jimmy Savage’s neat locution “small worlds” in speaking of what I took the intended applications of “possible-worlds semantics” to be.6 Only gradually have I come to realize to my considerable consternation that the likes of David Lewis7 and Alvin Plantingas are taking the metaphor literally.

Meanwhile I have realized the source of the “universalist” view in epistemology. It is the idea that our language has to be inter- preted once and for all, so that whenever we are speaking of any- thing at all, we are indirectly speaking of e~erything.~ Quantifiers have on this view only one range, viz. everything. This was Frege’s explicit view, and he was not the only one who had the courage of his prejudices-or at least of his overall vision of language and its relation to the world. I have examined this view which can be called the idea of language as the universal medium, and with an unwit- ting pun I have sometimes called it the universalist view of

See Hacking, I., “The Leibniz-Carnap Program for Inductive Logic”, The Jour- nal of PhiIosophy, 68, 1971, pp. 597410.

Savage, L. J., The Foundations of Statistics, New York: John Wiley, 1954, espe- cially pp. 82-9 1. ’ Lewis, D., On the Plurality of Worlds, Oxford: Basil Blackwell, 1986. * Plantinga, A., The Nature of Necessity, Oxford: Clarendon Press, 1974.

Cf. van Heijenoort, J., “Logic as Language and Logic as Calculus”, Synthese, 17, 1967, pp. 324-330; Hintikka, J. , “On the Development of the Model-Theoreti- cal Viewpoint in Logical Theory”, Synthese, 77, 1988, pp. 1-36; Kusch, M., Lan- guage as Calculus vs. Language as Universal Medium, Dordrecht: Kluwer Academic, 1989.

ON PROPER (POPPER?) AND IMPROPER USES OF INFORMATION 161

language. In spite of its outlandishness, it has played a major role in the development not only of logical theory but also of philoso- phy of language and philosophy in general. What we can see here is one of its manifestations in epistemology.

(3) Information and probability are inversely related.

This follows from the logical behavior of probability in its normal sense. The more alternatives a proposition admits of, the more probable and the less informative it is, and vice versa. This inverse relationship is at its clearest in purely logical definitions of proba- bility, in so far as such definitions are possible (cf. below), but is by no means restricted to them. In fact, any half-way natural proba- bility distribution creates as its mirror image a measure of informa- tion.

It is important to realize, however, that the inverse relationship holds without qualifications only for a priori probabilities and (ab- solute) information posterior probabilities (i.e. probabilities on evi- dence) and information need not always be inversely related. (Cf. here thesis ( 5 ) below.)

(4) A purely logical definition of information is impossible.

To my considerable embarrassment I have found myself labelled a defender of purely logical conception of probability (and by impli- cation of information) in the stamp of Rudolf Carnap.'O Yet the line of thought that Carnap started and I pushed further is as clear-cut reductio ad absurdurn of the purely logical conceptions of probabil- ity and information as one can hope to find in philosophy." Car- nap's A- continuum of inductive methods illustrates the situation. In it, we are observing individuals which can be classified as belonging to any one of K cells. We have observed N individuals of which n belong to a given cell. What is the probability that the next one be-

lo E.g. by Cohen, L. J., The Probable and the Provable, Oxford: Clarendon Press, 1911. ' I Carnap, R., The Continuum of Inductive Methods, Chicago: University of Chi- cago Press, 1952.

162 JAAKKO HlNTlKKA

longs to this particular cell? Given certain symmetry assumptions, the answer is

n+(A/K) N+A

where A is a parameter O S A . But what does A mean? For a subjectivist, A is an index of caution. When A = 0, the agent follows literally the observed relative frequency n/N; when A is large, the agent is reluctant to depart from the a priori symmetry considera- tions which suggest the probability 1 / ~ .

For an objectivist, the optimal value of A is determined by the order in the universe, measured e.g. by its entropy.I2 A guess as to what the appropriate A is, is therefore a guess as to how orderly the universe (including its unknown parts) is.

But if so, no purely logical considerations can fix the value of A. And this conclusion is reinforced if we relax Carnap’s symmetry requirements.13 For them it turns out that there is a large number of different dimensions of regularity and irregularity, each gov- erned by a parameter whose value measures an objective feature of the world and hence cannot be determined by purely logical means.

Among other things these observations imply that a rational epistemologist must in principle be always prepared to change his or her assignments of a priori probabilities and of absolute infor- mation in the light of experience. This means also a rejection of strict Bayesianism as a general approach to epistemology. For the defining characteristic of strict Bayesianism is precisely the use of conditionalization as the only means of belief change (redistribu- tion of probabilities on evidence). For the same reason, second- order probabilities are an unavoidable ingredient in any compre- hensive theory of the use of probabilities in epistemology and

Iz Cf. here Walk, K., “Simplicity, Entropy and Inductive Logic”, Aspects of induc- tive Logic, Hintikka, J. and Suppes, P., eds., Amsterdam: North-Holland, 1966, pp. 6 6 8 0 . l 3 See here Hintikka, J., “A Two-Dimensional Continuum of Inductive Methods”, op. cit. note 12 above, pp. 113-132; Hintikka, J. and Niiniluoto, I . , “An Axiomatic Foundation for the Logic of Inductive Generalization”, Studies in inductive Logic and Probability i I , Jeffrey, R . C., ed., Berkeley and Los Angeles: University of California Press, 1980, pp. 157-181.

ON PROPER (POPPER?) AND IMPROPER USES OF INFORMATION 163

philosophy of ~cience.’~ Such second-order probabilities can inter alia govern the rational change of indexes of caution like Car- nap’s A.

( 5 ) The use of information as a goal in epistemology (e.g. in the choice of hypotheses and theories) is compatible with the use of inductive probabilities.

Here I am expressing an insight which Soren Hallden has also ex- pressed clearly and f~rcefully.’~ Karl Popper has made much of the inverse relationship between information and probability discussed in (3) above. His ideas involve nevertheless a fundamental flaw. What are inversely related to each other are a priori probability and absolute information. No one has ever advocated the use of apriori probabilities as a guide to theory choice-or anything else, for that matter. What every decision theorist worth his or her utility matrix does is to use posterior probabilities (probabilities on evidence). And when some other utility is being maximized, what is most commonly maximized is its expected value. This expected or average value is of course averaged with respect to probabilities on evidence.

Thus a theory choice guided by a gust of high information does not have to involve the bold inspired leap of faith that Popper advocates. Information can be used in epistemology as a utility in the sense of decision theory.I6And not only can it be so used. It has been used extensively in this spirit by philosophers of science

l4 Second-order probabilities have been criticized by Bayesians like Savage, but in a wider perspective they are indispensable for a satisfactory theory of probabilistic support. Cf. here Sahlin, N.-E., “On Second Order Probabilities and the Notion of Epistemic Risk”, Foundations of Utility and Risk Theory with Applications, Stigum, B. P. and Wenstny, F., eds., Dordrecht: D. Reidel, 1983, pp. 95-104; Hintikka, J., “Unknown Probabilities, Bayesianism and DeFinetti’s Representation Theorem”, In Memory of Rudolf Curnap, Buck, R. C. and Cohen, R. S. , eds., Dordrecht: D. Reidel, 197 1, pp. 325-341. I s Among other places in HalldCn, S . , The Step into Tivrlight, Stockholm: Thales, 1994, pp. 4 0 4 4 ; in Sunnolikhetens logik, Lund: Gleerup, 1973, pp. 113-1 14; and in The Strategy of Ignorance, Stockholm: Thales, Library of Theoria, 17, 1986, chap- ter 7. l 6 Cf. here, e.g., Hintikka, J. and Pietarinen, J. , “Semantic Information and In- ductive Logic”, 011. cit. note 12 above, pp. 96-1 12.

164 JAAKKO HINTIKKA

like Isaac Levi.” Personally, I believe that there still are plenty of unused opportunities in this direction, even though I will not try to expound them here.

It is nevertheless important to realize that in this way we cannot hope to find one single epistemologically or logically privileged method of scientific or more generally speaking epistemological inference. For the probabilities needed cannot according to (4) be known fully a priori.

(6 ) Nonzero inductive probabilities can be associated with strict generalizations also in infinite universes (domains).

Here the notion on information can be of heuristic help. For it is not unnatural to think of degrees of information as being associ- ated with strict generalizations (quantified propositions) independ- ently of the universe one is considering. A modicum of logical acu- men even suffices to find out how this can be done.I8 All we need is a good grasp of how we can express in a given formal language more and more finely defined mutually exclusive alternatives. For this purpose, what has been called distributive normal forms offer an obvious implementation.

Once again, even some prominent philosophers have fallen prey to misconceptions. I have heard it argued that one cannot associ- ate probabilities to strict generalizations in an infinite universe be- cause it does not make sense to bet on generalizations, one alleged reason being that one would have to wait infinitely long to find out whether one has won or lost. But quite apart from questions whether such bets make sense or not, the argument is fallacious. For what has to be interpreted behavioristically is an entire prob- ability distribution, an assignment of probabilities to all proposi- tions, particular as well as general. Now it is easy to see that the differences between probability distributions that assign nonzero prior generalizations and those which do not show up already in

In a series of books beginning with Gambling with Truth, Lond0n:Routledge and

See note 13 above and cf. my papers in Information and Inference, Hintikka, J. Kegan Paul, 1967.

and Suppes, P., eds., Dordrecht: D. Reidel, 1970.

ON PROPER (POPPER?) AND IMPROPER USES OF INFORMATION 165

the probabilities which the agent assigns to probabilities of particu- lar propositions on particular evidence. Behavioristically speaking, whether or not one is committed to offering nonzero odds for a bet on the truth of strict generalizations is fully betrayed by the odds one is offering on the truth of finite propositions on finite evidence, Hence even if one wants to operationalize and behaviorize all use of probabilities in betting-theoretical terms, one does not have a basis for objecting to the assignment of nonzero priors to strict generalizations.

(7) There are several different kinds of information which can serve as utilities to be maximized in an epistemic decision.

For instance, in my contribution to the 1968 Amsterdam congressIY I distinguished attempts to maximize the substantial information of a hypothesis or theory from attempts to maximize the information which it shares with the data. The former kind of endeavor is charateristic of scientific theorizing, the latter of historical accounts. In one type of decision, the scientist might aim at a theory which would be potentially explanatory of different kinds of data whereas in another situation the only aim is to account for a given set of data. Different types of scientific and scholarly activities can thus be distinguished from each other by means of the kind of information they aim at. They can all of them be perfectly legitimate enterprises epistemologically. Yet they employ different methodologies because they aim at different “utilities”. Hence there cannot be just one set of methodological rules for all epistemic or even scientific endeavors.

In general, logarithmic measures of information, for instance measures connected with entropy, relate more to the surprise value of a proposition whereas such expressions as 1-P(S) can be said to measure informative content.

l9 Hintikka, J . , “The Varieties of Information and Scientific Explanation”, Logic, Methodology and Phihophy of Science 111, van Rootselaar, B. and Staal, J. F., eds., Amsterdam: North-Holland, 1968, pp. 3 I 1-331.