simplicity, truth, and ockham’s razors kevin t. kelly hanti lin department of philosophy carnegie...

Download Simplicity, Truth, and Ockham’s Razors Kevin T. Kelly Hanti Lin Department of Philosophy Carnegie Mellon University  Currently

If you can't read please download the document

Upload: briana-fleming

Post on 18-Jan-2018

220 views

Category:

Documents


0 download

DESCRIPTION

Ockham’s Razor Choose the simplest theory compatible with experience! constant linear quadratic

TRANSCRIPT

Simplicity, Truth, and Ockhams Razors Kevin T. Kelly Hanti Lin Department of Philosophy Carnegie Mellon UniversityCurrently supported by a 3 year grant from the Templeton foundation. Underdetermination Which theory is true? constant linear quadratic Ockhams Razor Choose the simplest theory compatible with experience! constant linear quadratic Skepticism constant linear quadratic Two Linked Questions 1.What is simplicity? 2.How does Ockhams razor help us find true theories*? *Theories vs. Models Models predict actual observations. Theories guide counterfactual predictions. *Theories vs. Models Models predict actual observations. Theories guide counterfactual predictions. Energy accurate predictor *Theories vs. Models Models predict actual observations. Theories guide counterfactual predictions. Energy accurate predictor Methodology Gap 1.Model selection: use theories to obtain accurate predictions that may become very inaccurate if you act on them. 2.Bayesianism: assign high credence to laws and theories but ask only whether you think they would be accurate. Welcome to the real problem of induction. Give up. 1. SIMPLICITY What is Simplicity? Brevity? Parsimony Computational compressibility? Unity? Explanatory power (likelihood)? Testability? Low dimensionality? Fewer causes or entities? Contextual mess? Proposed Answer An axiomatic theory of simplicity relative to a theory choice problem. Data-mining = question-mining unique simplicity order in many problems. Possible Worlds W Information States 1.W is the vacuous information state; 2.each true conjunction of information states is entailed by a true information state. I Information Topology Let V be the closure of I under arbitrary disjunction. (W, V) is a topological space with basis I. V Question Question Q is a partition into answers H, H,... H w = the unique answer true in world w. Q Problem P = (W, I, Q). Interior Worlds Isolated from complementary answer. No problem of induction. Boundary Worlds No answer is ever verified. Problem of induction. Boundary Answers Problem of induction everywhere. Purple, teal, and green are boundary answers. In infinite dimensions, all answers are boundaries. Nested Boundaries Origin is in boundary of axis. Axis is in boundary of plane. Skeptical Arrows S(w, X) w cl(X \ H w ). X w Benign Arrows B(w, X) 1.w cl(X); 2.w cl(X \ H w ); 3.w X. X w Arrows A(w, X) S(w, X) B(w, X). Arrows Between Possibilities S(D, D) ( w D) S(w, D) ; B(D, D) ( w D) B(w, D) ; A(D, D) ( w D) A(w, D). Arrows Between Possibilities S(D, D) ( w D) S(w, D) ; B(D, D) ( w D) B(w, D) ; A(D, D) ( w D) A(w, D). Coarse-Graining the Problem Let F partition W. Elements of F are coarse-grained possibilities. We want to represent all the problems of induction in P with arrows between possibilities. Then we call F a factorization of P. Polynomial Degree It looks like curvature is intrinsically more complex. 2 2 y = cx 2 + bx + a y = bx + a y = a Pure monomials {1} {3}, {1, 3}, {2,3}, {1,2,3} Transitions Tell a Different Story! It looks like curvature is intrinsically more complex. But higher degrees are more complex only because they allow for additional monomials. {2}, {1,2} y = cx 2 + bx + a y = bx + a y = a Two Definitions F| E = the restriction of F to E. Min(F, E) = the set of all elements of F| E that have no arrows coming in. Factorization Axioms Axiom 1 All worlds in D have the same, unique arrow status toward D. Axiom 2 For each possibility D, there is information E such that D E is in Min(F, E). Axiom 3 No D in Min(F, E) has an arrow to the disjunction of all possibilities in Min(F, E). Axiom 4 If there is an arrow from a possibility D to a second possibility D conjoined with answer H, then D H. Formalized Ax 1. w, w D [ S(w, D) S(w, D) B(w, D) ] [ B(w, D) B(w, D) S(w, D) ]. Ax 2. ( E) D E Min(F, E). Ax 3. D Min(F, E) A(D, UMin(F, E)). Ax 4. H Q A(D, D H) D H. Mathematical Note The axioms are a non-constructive way to invert the Cantor-Bendixson construction to apply to problems with all-boundary answers. Strict Partial Order Prop: Arrows are a strict partial order over possibilities. That is the simplicity order relative to P and F. Note: benign arrows can be necessary for transitivity! Strict Partial Order Prop: Arrows are a strict partial order over atomic reasons. Proof. Irreflexivity is immediate from acyclicity. For transitivity, suppose there are arrows from D to D to D. Then by the arrow characterization, D is in the closure of D and D is in the closure of D. Let w be in D. Then each neighborhood O of w catches a point w in D. But since w is in the closure of D, it follows that O catches a point in D. So w is in the closure of D. By acyclicity, D is distinct from D. So there is an arrow from w to D. So there is an arrow from D to D. Q is Decidable Within Possibilities Prop: Each answer to Q is open within each possibility. Uniqueness Axiom Axiom 5: If there is a answer H to Q such that D has an arrow to D H, then D H. Restriction P| E = the restriction of P to E. F| E = the restriction of F to E. PF E E Restriction P| E = the restriction of P to E. F| E = the restriction of F to E. E E P|EP|E F|EF|E Preservation Theorem Theorem: Let E be an information state. F factorizes P F| E factorizes P| E. Uniqueness Condition for Finite Questions Prop. Suppose that Q factors P and Q is finite. Then Q is the unique, coarsest factorization of P. A Uniqueness Condition for Infinite Questions Prop. Suppose that the question Q of P is a coarsest factorization of P and that no two distinct answers have identical sets of skeptical descendants. Then Q is the unique coarsest factorization of P. EXAMPLE: LAW VS. CATCHALL Law vs. Catchall Question: Will the binary experiment always yield 0? Information: Past outcomes Unique Coarsest Factorization Question: Will the binary experiment always yield 0? Information: Past outcomes... EXAMPLE: POINT HYPOTHESIS Point Hypothesis Question: Is the true value of parameter = 0? Information: intervals. ( ) Unique Coarsest Factorization Coding is Irrelevant 0 Epistemic Equivalence... Different surface topologies. Isomorphic coarsest factorizations. Isomorphic epistemology! Quotient Problem P/F = (W/F, I/F, Q/F). W/F = C w. E/F = {w in W/F: w is compatible with E}. I/F = {E/F: E I}. Q/F = {{w}: w is in W}. P/F = (W/F, I/F, Q/F). FP/F P ( ) Objective Empirical Effects Information that eliminates a simplest possibility. Relevance is driven by problem structure, not personal prior probability. EXAMPLE: HALF-LINE Half Line Question: Is the true value of parameter 0? Riding on Induction ( ) Transition to Deduction ( ) Ambiguous Representation Bottom cell violates arrow homogeneity. Skeptical arrow Unique Coarsest Factorization Refine into arrow-homogeneous possibilities. Benign arrow Skeptical arrow MAXWELL Maxwell Morrison (2000) notes that Maxwells equations historically received no support from their unifying power. Maxwell Ampere & Fresnel The Displacement Current Term Maxwell Ampere & Fresnel Radio works Radio doesnt work Ad Hoc Maxwell Maxwell Ampere & Fresnel Maxwell & Fresnel Ockham Before Hertz Maxwell Ampere & Fresnel Maxwell + Fresnel ? EXAMPLE: FREE PARAMETERS Parameter Space Question: Which parameters are 0? Information: = open balls. (x, 0) (0, y) (0, 0) (x, y) Unique Coarsest Factorization (0, 0) (x, 0) (0, y) (x, y) Ternary Case (x,0,z) (x,y,z) (0,0,0) (x,y,0) (0,y,z) (0,0,z) (x,0,0)(0,y,0) Dimensionality Question: number of free parameters. (x,0,z) (x,y,z)(x,y,z) (0,0,0) (x,y,0) (0,y,z) (0,0,z) (x,0,0)(0,y,0) Dimensionality (x,0,z) (x,y,z)(x,y,z) (0,0,0) (x,y,0) (0,y,z) (0,0,z) (x,0,0)(0,y,0) Unique coarsest factorization. EXAMPLE: TRIGONOMETRIC VS. PERIODIC Trigonometric vs. Periodic Is the phenomenon polynomial or periodic? Polynomials vs. Trig Polynomials Unique coarsest factorization. n 0 n 1 n 2 i n a i x i n 1 i n a i cos(ix) + b i sin(ix) n 2 n 0... Hidden arrows to infinite disjunctions. n 0 n 1 n 2 n 0 n 1 n 2 polynomial periodic... Polynomials vs. Trig Polynomials i n a i x i i n a i cos(ix) + b i sin(ix) Clumping possibilities violates axiom 4. periodic n 0 n 1 n 2... Polynomials vs. Trig Polynomials i n a i x i i n a i cos(ix) + b i sin(ix) Taylor vs. Fourrier Not asymptotically prime: always two minimal elements. n 0 n 1 n 2 n 0 n 1 n 2... i n a i cos(ix) + b i sin(ix) i n a i x i EXAMPLE: CAUSAL NETWORK SEARCH Causal Network Search Causal order without experiments! Reichenbach, Pearl, Spirtes, Glymour, Scheines etc. X Y Z Z X Y X gives some information about Y, But not if you know Z. X gives no information about Y, but does if you know Z. Causal Network Search Causal order without experiments! Reichenbach, Pearl, Spirtes, Glymour, Scheines etc. X gives some information about Y, But not if you know Z. X gives no information about Y, but does if you know Z. X X Y Y Z Z X X Y Y Z Z Causal Network Search Possible Information: Detected conditioanal statistical dependencies. X gives some information about Y, But not if you know Z. X gives no information about Y, but does if you know Z. X X Y Y Z Z X X Y Y Z Z Unique Coarsest Factorization x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z Focus onYZ Edge x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z Further Collapse x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z y y z z x x y y z z x x y y z z x x x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z y y z z x x y y z z x x y y z z x x y y z z x x y y z z Ignore Transitive Arrows x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z Rearrange x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z A Coarsest Factorization x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z x x y y z z Note the essential role of benign arrows. REFINED FACTORIZATIONS Benign Arrows as Confirmation Greedy Ockham leaps with no data when the factorization is coarse. Ock F (E) Benign Arrows as Confirmation Refined factorizations delay induction. From the outside is looks like waiting for confirmation. Ock F (E) Model Scoring Rules AIC, BIC, etc. Pick out an array of benign points. Benign Arrows as Skeptical Scoring Rules Model scoring rules are greedy. Pulling down benign points adds skeptical zones in which weaker answers are produced. 2. OCKHAMS RAZORS AND THEIR JUSTIFICATIONS What Connects Simplicity with Truth? God? Magic? Evolution? Idealism? Prior bias? Predictive accuracy? Wishful thinking? Scientific Realism Debate Realism: Simplicity is rhetorically compelling. Scientific Realism Debate Realism: Simplicity is rhetorically compelling. Anti-realism: But how could it help us find the true theory? Platonic Dilemma If you know already that the true theory is simple, you dont need Ockhams razor. If you dont know already that the true theory is simple, how could a fixed bias toward simpicity help you find it? Non-Responses Give up on true theories (a.k.a. counterfactual predictions from non-experimental data). Fail to show how Ockham is better at finding true theories. Assume a (circular) bias toward simplicity. Proposal Prove, without assuming a prior bias toward simplicity, that various senses of Ockhams razor optimize various senses of truth-conduciveness. Avoids error fastest! Nobody knows more faster! Retracts least! Methodology Awards METHODS AND INQUIRY Information Streams An information stream for w is an infinite sequence of information states for w such that: is ordered by ; 2.each information state for w is entailed by some state in . Information Streams Initial Segments |n = the initial segment of e of length n. Seq = { |n : Strm(w, e) n < }. Information Content [ ] = {w W : Strm(w)}. [e] = the last entry in e. Prop. [ ] [ |n]. Proof. By soundness of information streams. Forward and Back Concatenation: (E 1,,E n )*E = (E 1,,E n, E). Decrement: 1.()- = (). 2.(e*E)- = e. Methods A method B in F takes each finite information sequence e to a disjunction of possibilities in F| e. B B H E 1, , E n Simplicity Implies Global Determination Prop. Let Strm(e) and let F factor P. Then there is a unique D F such that [ ] D. So define: D = the unique D F such that [ ] D. Proof. Since F is a partition and [ ] by definition, there is at most one D F such that [ ] D. Suppose that there is no D F such that [ ] D. Then there exist distinct D, D and distinct w, w such that w D and w D and w, w [ ]. So is for both w, w. So w bdry(D) and w bdry(D). So there is a cycle in F. Contradiction. Solving F in the Limit B solves F from e For each info stream extending e, for all but finitely many n length(e): [ ] B( |n) D . Total info presented in Initial segment of Possibility true of the world presents MEASURES OF TRUTH CONDUCIVENESS Inquiry Losses (B, ) = 1 if B fails to solve F in = 0 otherwise. (B, ) = elapsed time to error-avoidance in H (B, ) = elapsed time to error-free convergence to disjunction H. (B, ) = total number of retractions prior to convergence. Truth Modulus (B, ) = total time to error-avoidance in Content Modulus H (B, ) = the least n such that for all m n, B( |m) H. Knowledge Modulus H (B, ) = max( H (B, ), (B, )). Retractions (B, ) = the number of distinct n such that: B( |n+1) B( |n). OPTIMAL TRUTH-CONDUCIVENESS Loss in General (B, ) = generic loss of inquiry. Coarse-Grained Loss (B, D, e) = sup (B, ) over Strm(D, e). Coarse-Grained Comparisons B e B ( D F| e ) (B, D, e) (B, D, e). Sub-problem Comparisons Let { e : e is a finite info sequence} be an indexed collection of partial orders on methods. Think of as a mapping (.) from e to e. Equivalence and Strict Order B e B B e B B e B; B > e B B e B B e B. Lexicographic and Pareto Comparisons B e B B e B (B e B B e B). B e B B e B B e B. Lexicographic Comparisons B ( e e ) B B e B (B e B B e B). Abbreviations e = e e ; e = e e. Lexicographic Comparisons B ( e e ) B B e B (B e B B e B). Prop. B ( e e ) B B > e B (B e B B e B). Lexicographic Comparisons B ( e e ) B B e B (B e B B e B). Prop. B ( e e ) B B > e B (B e B B e B). Proof of . 1.Suppose B e B (B e B B e B). 2.Case: B > e B. 3.So B e B, by 1,2. 4.So B e B, by 1,3. 5.So B > e B (B e B B e B). 6.Case: B > e B. 7.So B > e B (B e B B e B). Lexicographic Comparisons B ( e e ) B B e B (B e B B e B). Prop. B ( e e ) B B > e B (B e B B e B). Proof of . 1.Suppose B > e B (B e B B e B). 2.Case: B > e B. 3.Then B e B. 4.Suppose B e B. 5.Contradiction, by 2,4. 6.So B e B. 7.So (B e B B e B), by 4,6. 8.Case: B > e B. 9.So B e B B e B, by So B e B (B e B B e B). Lexicographic Comparisons Notation: B (> e > e ) B B ( e e ) B B ( e e ) B. B ( e e ) B B ( e e ) B B ( e e ) B. Strict Lexicographic Order Prop. B (> e > e ) B B > e B (B e B B > e B). Strict Lexicographic Order Prop. B (> e > e ) B B > e B (B e B B > e B). Proof of . 1.Suppose B (> e > e ) B. 2.So B ( e e ) B B ( e e ) B. 3.So (B > e B (B e B B e B)) (B > e B (B e B B e B)). 4.So (B > e B (B e B B e B)) B > e B (B e B B e B). 5.Case: B > e B. 6.So B > e B (B e B B > e B). 7.Case: B > e B. 8.So B e B B e B, by 4. 9.So B e B, by 4, So B > e B, by 8, So B e B B > e B, by 8, So B > e B (B e B B > e B), by 8, 10. Strict Lexicographic Order Prop. B (> e > e ) B B > e B (B e B B > e B). Proof of . 1.Suppose B > e B (B e B B > e B). 2.So B > e B (B e B B e B). 3.Case: B > e B. 4.So B > e B. 5.Suppose B e B. 6.Contradiction, by 3, 5. 7.So B e B, by 6. 8.So B > e B B e B B e B, by 5,7. 9.Case: B > e B. 10.So B e B B > e B, by So B e B, by Also, B > e B, by So B > e B B e B B e B, by 11, So (B > e B (B e B B e B)) B > e B (B e B B e B), by 2, 3, 8, 9, So B > e B (B e B B e B) (B > e B (B e B B e B)). 16.So B ( e e ) B B ( e e ) B. 17.So B (> e > e ) B. Lexicographic Equivalence Prop. B ( e e ) B B e B B e B. Lexicographic Equivalence Prop. B ( e e ) B B e B B e B. Proof of . 1.Suppose B ( e e ) B. 2.So B ( e e ) B B ( e e ) B. 3.So (B > e B (B e B B e B)) (B > e B (B e B B e B)). 4.Suppose B e B. 5.Then B > e B B > e B, by 3. Contradiction. 6.So B e B, by 4, 5. 7.So B e B, by 3, 6. Lexicographic Equivalence Prop. B ( e e ) B B e B B e B. Proof of . 1.Suppose B e B B e B. 2.So (B > e B (B e B B e B)) (B > e B (B e B B e B)). States of Inquiry Let e Seq. Let e- be one step shorter than e (unless e = ()). Let u be a potential methodological response to e-. (e, u) is a state of inquiry. Meth(e, u) If (e, u) is a state of inquiry, define: Let B(e) = ( B(e|0),..., B(e|(lh(e) - 1)) ). Meth(e, B) = Meth(e, B(e-)). Meth(e, u) Let B(e) = ( B(e|0),..., B(e|(lh(e) - 1)) ). If (e, u) SI, define: Meth(e, u) = { B Meth : B(e-) = u }. Prop. Let (e, u) SI. Then: 1.(e, B(e-)) SI ; 2.B Meth(e, B(e-)). Abbreviate: Meth(e, B) = Meth(e, B(e-)). Optimality Relative to M at (e, u) Let M be a collection of methods and let (e, u) SI. B O(M,, e, u) 1.B M; 2.B produces u in response to e-; 3.( e e)( B M that agrees with B along e-) B e B. Admissibility Relative to M at (e, u) Let M be a collection of methods and let (e, u) SI. B A(M,, e, u) 1.B M; 2.B produces u in response to e-; 3.( e e)( B M that agrees with B along e-) B > e B. Optimality Implies Admissibility Prop. (e, u) SI O(M, , e, u) A(M, , e, u). Optimality Implies Admissibility Prop. (e, u) SI O(M, , e, u) A(M, , e, u). Proof. O(M, , e, u) B M ( d e)( B M Meth(d, B)) B e B B M ( d e)( B M Meth(d, B)) B < e B A(M, , e, u). La Crme de la Crme (et Lacceptable de la Crme) Optimality and admissibility can now be iterated. For legibility, write: (O O )(e, u ) = O(O(, e), , e, u )); (O A )(e, u ) = A(O(, e), , e, u )), etc. Distribution of O Prop. O( )(e, u) = (O O )(e, u). lexicographic optimality nested optimality Distribution of O Prop. O( )(e, u) = (O O )(e, u). Proof of side. 1.Suppose B (O O )(e, u). 2.So B (O (e, u) ( d e) ( B Meth(d, B) ) (B (O )(e) B d B), by def (O(.) O(.)). 3.So B (O (e, u), by 2. 4.Let d e, B Meth( d, B ). 5.So B d B, by 3, 4. 6.Suppose that B d B. 7.Then B (O )(e, u), by 3, 6. 8.So B d B, by 2, 4, 7. 9.So B d B B d B, by 6, So B d B (B d B B d B), by 5, So ( d e)( B Meth(d, B)) (B d B (B d B B d B)), by 4, So ( d e)( B Meth(d, B)) B ( d d ) B, by def . 13.B O( )(e), by def O. Distribution of O Prop. O( )(e, u) = (O O )(e, u). Proof of side. 1.Suppose B O( )(e, u). 2.So ( d e)( B Meth(d, B)) B ( d d ) B, by def O. 3.So ( e e)( B Meth(d, B)) (B d B (B d B B d B)), by def . 4.So ( e e)( B Meth(d, B)) B d B. 5.So B (O (e, u). 6.Let d e, B Meth(d, B). 7.Suppose that B (O )(d, u). 8.Then B d B, by 5, 7. 9.So B d B, by 3, So B (O )(d, u) B d B, by 7, ( d e)( B Meth(d, B)) (B (O )(d, u) B d B), by 6, So B (O (d, u) ( d e)( B Meth(d, B)) (B (O )(d, u) B e B), by 5, So B (O O )(d, u), by def (O(.) O(.)). OCKHAMS RAZORS DERIVED FROM OPTIMALITY Solutions Characterized Thm. (O )(e, u) = t he set of all methods that: 1.respond with u to e-; 2.eventually include every oldest simplest possibility; 3.eventually narrow possibilities down to a single simplest one. (O )(e) Characterization Proof. We already showed that Sol(e) = LR s (e) LG(e) when F factors P. So Meth(e, u) Sol(e) = LR s (e) LG(e) Sol(e). Suppose that B Meth(e, u) Sol(e). Let d e and let B Meth(d, B). Let D F| d and Strm(D, e). Then [ ] B( |n) D , for cofinitely many n. So (B, ) = 0 (B, ). So B (O )(e, u). Hence, B (A )(e, u). Now suppose that B Meth(e, u) Sol(e). Case: B Meth(e, u). Then immediately B (A )(e, u). Case: B Sol(e). Since F factors P, let B Sol(e) Meth(e, B) (i.e., take any solution B and make it agree with B along e- to arrive at B). Then argue as above that B (O )(e, u). Furthermore, for some D F| e, (B, D, e) = 1. So B (A )(e, u). Nice but Not Enough Allows for arbitrarily many violations of Ockhams razor in the short run. Carnap [Reichenbach] has the merit of having first emphasized these important points 1.The decisive justification of an inductive procedure does not consist in its plausibilitybut must refer to its success in some sense. 2.The fact that the truth of the predictions reached by induction cannot be guaranteed does not preclude a justification in a weaker sense. 3.It can be proved that induction leads in the long run to success in a certain sense. Carnap [Reichenbachs convergence analysis] is an important step in the right direction, but only a first step. His rule, which he calls the rule of induction, is far from being the only one [that succeeds in his sense]. Therefore, we need a stronger method for comparing any two given rules of induction. Error-Avoidance Optimality Prop. (O O )(e, u) = (O A )(e, u) = the set of all methods that: 1.respond with u to e-; 2.henceforth entertain all simplest possibilities; 3.eventually narrow possibilities down to a single simplest one. Basic Idea Every convergent method has = in non-simplest possibilities. Ockham has = 0 in simplest possibilities. Violator has > 0 in some simplest possibility. Error-Avoidance Optimality (O A )(e, u) is Empty in the poly/trig problem! Non-empty in the other examples. poly n 1 n 2... trig poly Retraction Optimality Theorem. (O O )(e, u) = (O A )(e, u) = the set of all methods that: 1.Respond with u to e-; 2.henceforth do not eliminate simplest possibilities until nature does. 3.henceforth do not retract until nature rules out all simplest possibilities compatible with the previous answer. 4.henceforth entertain all simplest possibilities after retracting. 5.eventually entertain some simplest possibilities. 6.eventually narrow possibilities down to a single simplest one. Basic Idea Every convergent method can be forced to retract up the longest path to a possibility. Ockham retracts at most once per step. Violator has an extra retraction in a possibility that is simplest at the time of the violation (and in all higher possibilities). Error-Avoidance Optimality Prop. (O O )(e, u) . poly n 1 n 2... trig poly (O A )(e) Characterization PerformanceRMG OO LR snr LG s O AO A LR snr HR s HM s HM pat HG w Prop. Assume that F factors P. Then: (O A )(e, u) = LR snr (e) HR s (e) HM s (e) HM pat (e) HG w (e) Meth(e, u). Knowledge Admissibility Theorem. (O A H H )(e, u) = the set of all methods that: Respond with u to e-; henceforth, entertain some simplest possibilities; henceforth, rule out all non-simplest possibilities; henceforth entertain all simplest possibilities after retracting; from tomorrow, do not rule out simplest possibilities until nature does; from tomorrow, do not retract until nature rules out all simplest possibilities compatible with the previous answer. Knowledge Admissibility (O O H H )(e, u) is empty in non-trivial problems---different methods can favor different possibilities. (O A O )(e) Characterization PerformanceRMG OO LR snr LG O AO A LR snr HR s HM s HM pat HG w O A A ... O A O LR s HR rstrt-w HM s HM pat HG w Prop. Suppose that F factors P. Then: (O A O )(e, u) = = (O A A )(e, u) = LR s (e) HR rstrt-w (e) HM s (e) HM pat (e) HG w (e) Meth(e, u). Knowledge then Retractions Prop. (O A H H O )(e, u) = the set of all methods that: Respond with u to e-; henceforth entertain some simplest possibilities; henceforth rule out all non-simplest possibilities; henceforth entertain all simplest possibilities after retracting; henceforth do not rule out simplest possibilities until nature does; henceforth do not retract until nature rules out all simplest possibilities compatible with the previous answer. Knowledge then Retractions (O A H H O )(e, u) can be empty. How it Works 1.Simplicity gives a winning strategy to Nature to force changes of opinion along simplicity paths. 2.That strategy allows one to establish lower bounds on loss in each possibility for arbitrary convergent methods. 3.Ockham methods meet those bounds. 4.Non-Ockham methods do worse in simple possibilities. Some Observations 1.Justification is finding the truth in the best possible way, even if the best possible performance is very weak. 2.Convergence in the limit is too weak for methodology---anything goes in the short run. 3.Short-run, worst-case reliability is too strong when induction is required--it results in inductive skepticism or circular arguments. 4.Convergence plus worst-case asymptotic losses yields surprisingly strong, short-run methodological principles for genuine inductive problems. 5.The recommendations depend on the losses one emphasizes. Thanks!