statistics between inductive logic and empirical science jan sprenger university of bonn tilburg...

35
Statistics between Statistics between Inductive Logic and Inductive Logic and Empirical Science Empirical Science Jan Sprenger University of Bonn Tilburg Center for Logic and Philosophy of Science 3 rd PROGIC Workshop, Canterbury

Post on 18-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Statistics between Statistics between Inductive Logic and Inductive Logic and Empirical ScienceEmpirical Science

Jan Sprenger

University of Bonn

Tilburg Center for Logic and Philosophy of Science

3rd PROGIC Workshop, Canterbury

I. The Logical I. The Logical Image of StatisticsImage of Statistics

Inductive LogicInductive Logic

Deductive logicDeductive logic discerns valid, discerns valid, truth-preserving inferencestruth-preserving inferences

P; P P; P Q Q Q Q

Inductive LogicInductive Logic

Deductive logicDeductive logic discerns valid, truth- discerns valid, truth-preserving inferencespreserving inferences

P; P P; P Q Q Q Q Inductive logicInductive logic generalizes that idea generalizes that idea

to non-truth-preserving inferencesto non-truth-preserving inferences

P; P supports Q P; P supports Q (more) probably (more) probably QQ

Inductive LogicInductive Logic

Inductive logic: truth of premises Inductive logic: truth of premises indicatesindicates truth of concluions truth of concluions

Main concepts: confirmation, evidential

support

Inductive LogicInductive Logic

Inductive logic: truth of premises Inductive logic: truth of premises indicatesindicates truth of concluions truth of concluions

Inductive inference: objective and Inductive inference: objective and independent of external factors independent of external factors

Main concepts: confirmation, evidential

support

The Logical Image of The Logical Image of StatisticsStatistics

Statistics infers from particular data Statistics infers from particular data to general modelsto general models

The Logical Image of The Logical Image of StatisticsStatistics

Statistics infers from particular data Statistics infers from particular data to general modelsto general models

Formal theory of inductive Formal theory of inductive inference, governed by general, inference, governed by general, universally applicable principlesuniversally applicable principles

The Logical Image of The Logical Image of StatisticsStatistics

Statistics infers from particular data Statistics infers from particular data to general modelsto general models

Formal theory of inductive Formal theory of inductive inference, governed by general, inference, governed by general, universally applicable principlesuniversally applicable principles

Separation of statistics and decision Separation of statistics and decision theory (statistics summarizes data in theory (statistics summarizes data in a way that makes a decision-a way that makes a decision-theoretic analysis possible) theoretic analysis possible)

The Logical Image of The Logical Image of StatisticsStatistics

Contains theoretical (mathematics, Contains theoretical (mathematics, logic) as well as empirical elements logic) as well as empirical elements (problem-based engineering of (problem-based engineering of useful methods, interaction with useful methods, interaction with „real science“)„real science“)

Where to locate on that scale?

The Logical Image of The Logical Image of StatisticsStatistics

Pro: mathematical, „logical“ Pro: mathematical, „logical“ character of theoretical statisticscharacter of theoretical statistics

The Logical Image of The Logical Image of StatisticsStatistics

Pro: mathematical, „logical“ Pro: mathematical, „logical“ character of theoretical statisticscharacter of theoretical statistics

Pro: mechanical character of a lot of Pro: mechanical character of a lot of statistical practice (SPSS & Co.) statistical practice (SPSS & Co.)

The Logical Image of The Logical Image of StatisticsStatistics

Pro: mathematical, „logical“ Pro: mathematical, „logical“ character of theoretical statisticscharacter of theoretical statistics

Pro: mechanical character of a lot of Pro: mechanical character of a lot of statistical practice (SPSS & Co.) statistical practice (SPSS & Co.)

Pro: Connection between Bayesian Pro: Connection between Bayesian statistics and probabilistic logicstatistics and probabilistic logic

The Logical Image of The Logical Image of StatisticsStatistics

Pro: mathematical, „logical“ Pro: mathematical, „logical“ character of theoretical statisticscharacter of theoretical statistics

Pro: mechanical character of a lot of Pro: mechanical character of a lot of statistical practice (SPSS & Co.) statistical practice (SPSS & Co.)

Pro: Connection between Bayesian Pro: Connection between Bayesian statistics and probabilistic logicstatistics and probabilistic logic

Cons: presented in this work...Cons: presented in this work...

II. Parameter II. Parameter EstimationEstimation

A Simple ExperimentA Simple Experiment

Five random numbers are drawn Five random numbers are drawn from {1, 2, ..., N} (N unknown): from {1, 2, ..., N} (N unknown): 21, 4, 26, 18, 12 21, 4, 26, 18, 12

What is the optimal estimate of N on What is the optimal estimate of N on the basis of the data?the basis of the data?

A Simple ExperimentA Simple Experiment

Five random numbers are drawn Five random numbers are drawn from {1, 2, ..., N} (N unknown): from {1, 2, ..., N} (N unknown): 21, 4, 26, 18, 12 21, 4, 26, 18, 12

What is the optimal estimate of N on What is the optimal estimate of N on the basis of the data?the basis of the data?

That depends on the loss function!

Estimation and Loss Estimation and Loss FunctionsFunctions

Aim: estimated parameter value Aim: estimated parameter value close to true value close to true value

Loss functionLoss function measures distance measures distance between estimated and true valuebetween estimated and true value

Estimation and Loss Estimation and Loss FunctionsFunctions

Aim: estimated parameter value Aim: estimated parameter value close to true value close to true value

Loss functionLoss function measures distance measures distance between estimated and true valuebetween estimated and true value

Choice of loss function sensitive to Choice of loss function sensitive to external constraintsexternal constraints

A Bayesian approachA Bayesian approach

Elicit prior distribution for the Elicit prior distribution for the parameter Nparameter N

Use incoming data for updating via Use incoming data for updating via conditionalizationconditionalization

Summarize data in a posterior Summarize data in a posterior distribution (credal set, etc.) distribution (credal set, etc.)

Perform a decision-theoretic analysisPerform a decision-theoretic analysis

III. Model III. Model SelectionSelection

Model SelectionModel Selection

True model usually „out of reach“True model usually „out of reach“ Main idea: minimzing discrepancy Main idea: minimzing discrepancy

between the approximating and the between the approximating and the true modeltrue model

Discrepancy can be measured in Discrepancy can be measured in various waysvarious ways

cf. choice of a loss function cf. choice of a loss function Kullback-Leibler divergence, Gauß distance, Kullback-Leibler divergence, Gauß distance,

etc. etc.

Model SelectionModel Selection

A lot of model selection procedures A lot of model selection procedures focuses on focuses on estimating the estimating the discrepancy between the candidate discrepancy between the candidate model and the true modelmodel and the true model

Choose the model with the lowest Choose the model with the lowest estimated discrepancy to the true estimated discrepancy to the true modelmodel

That is easier said than done...

Problem-specific Problem-specific PremisesPremises

Asymptotic behaviorAsymptotic behavior Small or large candidate model set?Small or large candidate model set? Nested vs. non-nested modelsNested vs. non-nested models Linear vs. non-linear modelsLinear vs. non-linear models Random error structureRandom error structure

Problem-specific Problem-specific PremisesPremises

Asymptotic behaviorAsymptotic behavior Small or large candidate model set?Small or large candidate model set? Nested vs. non-nested modelsNested vs. non-nested models Linear vs. non-linear modelsLinear vs. non-linear models Random error structureRandom error structure

Scientific understanding required to fix the

premises!

Bayesian Model Bayesian Model SelectionSelection

Idea: Search for the Idea: Search for the most probable most probable model model (or the model that has the (or the model that has the highest highest Bayes factorBayes factor))

Variety of Bayesian methods (BIC, Variety of Bayesian methods (BIC, intrinsic and fractional Bayes intrinsic and fractional Bayes Factors, ...)Factors, ...)

Bayesian Model Bayesian Model SelectionSelection

Idea: Search for the Idea: Search for the most probable most probable model model (or the model that has the (or the model that has the highest highest Bayes factorBayes factor))

Variety of Bayesian methods (BIC, Variety of Bayesian methods (BIC, intrinsic and fractional Bayes intrinsic and fractional Bayes Factors, ...)Factors, ...)

Does Bayes show a way Does Bayes show a way out of the problems? out of the problems?

Bayesian Model Bayesian Model SelectionSelection

If the true model is not contained in If the true model is not contained in the set of candidate models: must the set of candidate models: must Bayesian methods be justified by Bayesian methods be justified by their distance-minimizing their distance-minimizing properties? properties?

Bayesian Model Bayesian Model SelectionSelection

If the true model is not contained in the If the true model is not contained in the set of candidate models: must Bayesian set of candidate models: must Bayesian methods be justified by their distance-methods be justified by their distance-minimizing properties? minimizing properties?

It is not trivial that a particular distance It is not trivial that a particular distance function (e.g. K-L divergence) is indeed function (e.g. K-L divergence) is indeed minimized by the model with the highest minimized by the model with the highest posterior!posterior!

Bayesian probabilities = probabilities of Bayesian probabilities = probabilities of being close to the true model?being close to the true model?

Model Selection and Model Selection and Parameter EstimationParameter Estimation

In the elementary parameter In the elementary parameter estimation case, posterior distributions estimation case, posterior distributions were independent of decision-theoretic were independent of decision-theoretic elements (utilities/loss functions)elements (utilities/loss functions)

The reasonableness of a posterior The reasonableness of a posterior distribution in Bayesian model distribution in Bayesian model selection is itself relative to the choice selection is itself relative to the choice of a distance/loss functionof a distance/loss function

IV. ConclusionsIV. Conclusions

Conclusions (I)Conclusions (I)

Quality of a model selection method Quality of a model selection method subject to a plethora of problem-subject to a plethora of problem-specific premises specific premises

Model selection methods must be Model selection methods must be adapted to a specific problem adapted to a specific problem (“engineering“)(“engineering“)

Conclusions (I)Conclusions (I)

Quality of a model selection method Quality of a model selection method subject to a plethora of problem-specific subject to a plethora of problem-specific premises premises

Model selection methods must be adapted Model selection methods must be adapted to a specific problem (“engineering“)to a specific problem (“engineering“)

Bayesian methods in model selection Bayesian methods in model selection should have an instrumental should have an instrumental interpretationinterpretation

Difficult to separate proper statistics from Difficult to separate proper statistics from decision theory decision theory

Conclusions (II) Conclusions (II)

Optimality of an estimator is a highly Optimality of an estimator is a highly ambiguous notionsambiguous notions

Statistics more alike to scientific Statistics more alike to scientific modelling than to a branch of modelling than to a branch of mathematics?mathematics?

More empirical science than More empirical science than inductive logic?inductive logic?

Thanks a lot Thanks a lot for your attention!!!for your attention!!!

© by Jan Sprenger, Tilburg, September 2007