statistics

35
Statistics How to get data and model to fit together?

Upload: ataret

Post on 23-Feb-2016

27 views

Category:

Documents


0 download

DESCRIPTION

Statistics. How to get data and model to fit together?. The field of statistics. Not dry numbers, but the essence in them. Model vs data – estimation of parameters of interest - PowerPoint PPT Presentation

TRANSCRIPT

Statistikk

StatisticsHow to get data and model to fit together?

The field of statisticsNot dry numbers, but the essence in them. Model vs data estimation of parameters of interestEx. of parameters: mean yearly precipitation, the explanation value one discharge time series has on another, the relationship between stage and discharge. Parameters are typically unknown but the data gives us information that can be useful for estimation. Model choiceGives answers to questions like: is the mean precipitation the same in to neighbouring regions?, Can we use one discharge time series to say something about another?These answers are not absolute, but are given with a given precision (confidence or probability). Data uncertaintyPerfect measurements+ perfect models = zero uncertainty (whether its model parameters or the models themselves)

Sources of uncertainty:Real measurements come with a built-in uncertainty.Models cant take everything into account. Unmeasured confounder (local variations in topography and soil in a hydrological model for instance)

Both problems can be handled by looking at how the measurements spread out, i.e. at the probability distribution of the data, given model and parameters. Our models thus need to contain probability distributions.

This uncertainty has consequences for how sure we can be about out models and parameters.

Data summery statistics (the numbers, not the field)Ways of summarizing the data (x1,,xn):Means:

Empirical quantiles: for instance, q(0.5) is the median.Empirical variance (sum of squared deviations):

Empirical correlation between two quantities, x and y:

Histograms, counts the instances or rates inside intervals. Gives an indicator how the data is distributed.Empirical histograms, counts the rate if instances lower than each given value. Empirical quantiles are easy to read from this.

Data summery vs the underlying reality (distributions)While the data is known, what has produced the data is not. This means summary statistics dont represents the reality of the system, only an indicator of what this reality might be.

For instance, the histogram from the previous slide was produced by the normal distribution, but this distribution doesnt look exactly the same as the histogram.

Similarly the mean isnt the distribution mean (the expectancy), the empirical correlation isnt the real correlation, the empirical median isnt the distribution median, the empirical variance isnt the distribution variance etc.

One single distribution can produce a wide variety of outcomes! But the opposite is then also the case, a wide variety of distributions could produce the same data! How to go from data to distribution?Example of distributional problemsRegression. What is the functional relationship between x and y?

Forecasting. What to expect from floods using whats been happening before as input?

Model testing. Is hydrological model A better than B?

Filling in missing data.

Problem Expressed distributionallyWhats the distribution of y given x?

What is the distribution of future flood events, given the past?

Does model A summarize/predict the distribution of the data better than model B?

Whats the joint distribution of the missing and actual data?

ProbabilityViews on probabilityThe long term rate of outcomes that falls into a given category. For instance, one can expect that 1/6 of all dice throws gives the outcome one. The relationship between a payoff and what you are willing to risk for it. For instance, you might be willing to risk 10kr in order to get back 60kr on a bet that the next outcome of the die is one. A formal way of dealing with plausibility (reasoning under uncertainty). A probability of 1/6 for getting one on the die means that you have neither greater nor less belief in that outcome than in the other 5 possible outcomes. Notation: Will use Pr(something) to say the probability of something. I is a frequentist definition, while II and II are Bayesian.Laws of probability Examples:

Pr(flood on the west coast)=1.1 means you have calculated incorrectly!

Pr(two or more one the die) = 1-Pr(one) = 1-1/6=5/6

Pr(one or two on a single dice throw) = Pr(one)+Pr(two)= 1/6+1/6=1/3

0Pr(A)1

Pr(A)+Pr(not A)=1

Pr(A or B)=Pr(A)+Pr(B) when A and B are mutually exclusive.Laws of probability 2 conditional probabilityExamples: Pr(rain | overcast)

One throw of the die does not affect the next =>Pr(one on the second throw | one on the first throw) = Pr(one on the second throw).

Pr(one on the first and second throw) =Pr(one on the first throw)* Pr(one on the second throw | one on the first throw) = Pr(one on the first throw)Pr(one on the second throw) =1/6*1/6=1/36.

From Bayes formula: If B is independent from A, Pr(A|B)=Pr(A), then A is also independent from B; Pr(B|A)=Pr(B).Pr(A | B) gives the probability for A in cases where B is true.

Pr(A|B)=Pr(A) means that A is independent from B. B gives no information about A. A dependency between parameters and data is what makes parameter estimation.

Pr(A and B)=Pr(A|B)Pr(B)

Since Pr(A and B)=Pr(B|A)Pr(A) also, we get Bayes formula:Pr(A|B)=Pr(B|A)Pr(A)/Pr(B)Ex. of conditional probabilitiesAssume that Pr(rain two consecutive days)=10%, and that Pr(rain a given day)=20%.Whats the probability of rain tomorrow if it rains today?

Pr(rain tomorrow | rain today) = Pr(rain today and tomorrow)/Pr(rain today)=10%/20%=50%.

If its overcast 50% of the time and its always overcast when its raining what is the probability of rain given overcast?

Pr(rain | overcast) =Pr(overcast and rain)/Pr(overcast)=Pr(overcast | rain)Pr(rain)/Pr(overcast)=100%*20%/50%=40%.

Say that overcast is evidence for rain. Pr(rain | overcast)>Pr(rain)(PS: I redevelop Bayes formula here!)Conditional probability as inferential logicFrom the previous example, it was seen that the probability of rain increases when we know its overcast. With probability as inferential logic terminology, overcast is evidence for rain. Evidence is information that increases the probability for something we are uncertain about. Its possible to make rules for evidence, even when exact probabilities are not available.

Ex:When A->B, then B is evidence for A. (If rain -> overcast then overcast is evidence for rain). When A is evidence for B, then B is evidence for A. (If a flood at position A increases the risk of there being a flood at location B, then ...) Note that the strength of the evidence does not have to be the same both ways.If A is evidence for B and B is evidence for C (and there are no direct dependency between A and C), then A is evidence for C. (If Oddgeir mostly speaks the truth and he says its overcast, then that is evidence for rain.)If A is evidence for B, then not A is evidence for not B. (Not overcast, clear skies, is evidence against rain. If you have been searching for the boss inside the building without finding him/her, then that is evidence for he/she not being in the building.See Reasoning under uncertainty on Youtube for more.

The law of total probabilityIf one has the conditional probabilities for one thing and the unconditional (marginal) probabilities of another, one can retrieve the unconditional (marginal) distribution of the first thing. This process is called marginalization.

Lets say we have three possibilities spanning the realm of all possible outcomes: B1, B2 or B3. So, one and only one of B1, B2 and B3 can be true. (For instance rain, overcast without rain and sunny, A could be the event that a person uses his car to get to work.)

Pr(A) = Pr(A and B1) + Pr(A and B2) + Pr(A and B3) = Pr(A|B1)Pr(B1)+Pr(A|B2)Pr(B2)+Pr(A|B3)Pr(B3)

Its the same if there are more (or less) possibilities for B.

Example: Assume that the probability of hail in the summer half-year is 2% and in the winter 20% (these are thus conditional probabilities). Whats the probability of hail in an arbitrary day in the year? Pr(hail)=Pr(hail | summer )Pr(summer)+Pr(hail | winter)Pr(winter)= 20%*50%+2%*50%=10%+1%=11%

Properties of stochastic variables (stuff that has a distribution)The expectation value is the mean of a distribution, weighted on the probabilities

For a die, the expectation value is 3.5. For a uniformly distributed variable between 0 and 1, the expectation is .For a normally distributed variable, the expectation is a parameter, .

The standard deviation (sd) is a measure how much spread you can expect. Technically its the square root of the variance (Var), defined as:

For a uniformly distributed variable between 0 and 1, the variance is 1/12.For a normally distributed variable, the standard deviation is a parameter, (or variance 2).

when there are N different possible outcomesCovariance and correlationIf two variables X and Y are dependent (so Pr(Y|X)Pr(Y)), there is a measure for that also. First off one can define a covariance, which tells how X and Y varies linearly together:

Where Nx and Ny are the different possible outcomes for X and Y respectively.

Covariance will however depend both on the (linear) dependency between X and Y but also the scale of both of them. To remove the latter, we form the correlation:

Note that -1XY1 always. XY =1 means perfect linear dependency.

Also note that the correlation summarizes dependency only linear dependency, not non-linear dependency! It is even possible to have perfect dependency but no correlation!

Samples from stochastic variables- the law of large numbersIf we can sample from a statistical distribution enough times, we will eventually see that

The data we see is seen as a sample set from some (unknown) distribution.

f(x)Rates approaches the probabilities

The mean approaches the expectancy value.

The empirical variance approaches the distribution variance.

The rate of samples falling inside an interval approaches the interval probability. Thus the histogram approaches the probability density.

Empirical quantiles approaches the distributional quantiles.Empirical correlations approaches distributional correlation.

Diagnostic plots concerning probability distributionsOne can compare the histogram with the distribution (probability density).

Cumulative rates can be compared to the cumulative distribution function.

One can also plot theoretical quantiles vs sample quantiles. This is called QQ plots. If the right distribution has been chosen, the points should lie on a straight line.

The Bernoulli process and the binomial distributionA process of independent events having only two outcomes of interest, success and failure, is called a Bernoulli process.

Ex:Coin tosses.Years where the discharge went over a given threshold in Glomma.Incorrect use: Rainy days last month.

If you count the number of successes in n trials, you get the binomial distribution. It is characterized by the success rate, p. This is often an unknown parameter that wed like to estimate.p=probability for heads (p=50%)p=probability of discharge>threshold.E(X)=np. Var(X)=np(1-p)

In this case, n=30, p=0.3Related: The negative binomial distribution. Counts the number of failures before the kth success.

Distributional families - PoissonThe Poisson distribution is something you get when you count the number of events that happens independently in time (Poisson process), inside a time interval. Number of car accidents with deadly outcome per year.Number of times the discharge goes above a given threshold in a given time interval. (PS: Strictly speaking not independent!)

The Poisson distribution is characterized by a rate parameter, . =Deadly traffic danger =Threshold rate

If the rate is uncertain in a particular way (gamma distributed) the outcome will be negative binomially distributed.

E(X)=Var(X)=.

In this case, =10.tt1t2t3t4

Probability density

For stuff that has a continuous nature (discharge/humidity/temperature) we cant assign probabilities directly, since theres an uncountable amount of outcomes.

What we can do instead is to assign a probability density... We use the notation f(x) for probability densities, with the x denoting which stochastic variable X which it is the probability density over.

A probability density gives the probability of falling inside a small interval times the size of this interval. Pr(x