# statistical inferences

Embed Size (px)

DESCRIPTION

Statistical Inferences. Jake Blanchard Spring 2010. Introduction. Statistical inference=process of drawing conclusions from random data Conclusions of this process are “propositions,” for example Estimates Confidence intervals Credible intervals Rejecting a hypothesis - PowerPoint PPT PresentationTRANSCRIPT

Probability Density Functions

Statistical InferencesJake BlanchardSpring 2010Uncertainty Analysis for Engineers11IntroductionStatistical inference=process of drawing conclusions from random dataConclusions of this process are propositions, for exampleEstimatesConfidence intervalsCredible intervalsRejecting a hypothesisClustering data pointsPart of this is the estimation of model parametersUncertainty Analysis for Engineers2Parameter EstimationPoint EstimationCalculate single number from a set of observational dataInterval EstimationDetermine interval within which true parameter lies (along with confidence level)Uncertainty Analysis for Engineers3PropertiesBias=expected value of estimator does not necessarily equal parameterConsistency=estimator approaches parameter as n approaches infinityEfficiency=smaller variance of parameter implies higher efficiencySufficient=utilizes all pertinent information in a sampleUncertainty Analysis for Engineers4Point EstimationStart with data sample of size NExample: estimate fraction of voters who will vote for particular candidate (estimate is based on random sample of voters)Other examples: quality control, clinical trials, software engineering, orbit predictionAssume successive samples are statistically independentUncertainty Analysis for Engineers5EstimatorsMaximum likelihoodMethod of momentsMinimum mean squared errorBayes estimatorsCramer-Rao boundMaximum a posterioriMinimum variance unbiased estimatorBest linear unbiased estimatoretcUncertainty Analysis for Engineers6Maximum LikelihoodSuppose we have a random variable x with pdf f(x;)Take n samples of xWhat is value of that will maximize the likelihood of obtaining these n observations?Let L=likelihood of observing this set of values for xThen maximize L with respect to Uncertainty Analysis for Engineers7Maximum LikelihoodUncertainty Analysis for Engineers8

ExampleTime between successive arrivals of vehicles at an intersection are 1.2, 3, 6.3, 10.1, 5.2, 2.4, and 7.2 secondsAssume exponential distributionFind MLE for Uncertainty Analysis for Engineers9SolutionUncertainty Analysis for Engineers10

2-Parameter ExampleMeasure cycles to failure of saturated sand (25, 20, 28, 33, 26 cycles)Assume lognormal distributionUncertainty Analysis for Engineers11SolutionUncertainty Analysis for Engineers12

Method of MomentsUse sample moments (mean, variance, etc.) to set distribution parametersUncertainty Analysis for Engineers13ExampleTime between successive arrivals of vehicles at an intersection are 1.2, 3, 6.3, 10.1, 5.2, 2.4, and 7.2 secondsAssume exponential distributionMean=5.05Uncertainty Analysis for Engineers142-Parameter ExampleMeasure cycles to failure of saturated sand (25, 20, 28, 33, 26 cycles)Assume lognormal distributionMean=26.4Standard Deviation=4.72Solve for and =3.26=0.177

Uncertainty Analysis for Engineers15SolutionUncertainty Analysis for Engineers16

Minimum Mean Square ErrorChoose parameters to minimize mean squared error between measured data and continuous distributionEssentially a curve fitUncertainty Analysis for Engineers17ApproachExcelGuess parametersCalculate sum of squares of errorsVary guessed parameters to minimize error (use the Solver)MatlabUse fminsearch functionUncertainty Analysis for Engineers18ExampleSolar insolation dataGather dataForm histogramNormalize histogram by number of samples and width of bins Uncertainty Analysis for Engineers19Scatter Plot and HistogramUncertainty Analysis for Engineers20Normal and Weibull FitsUncertainty Analysis for Engineers21Mean=3980 (fit)Mean=3915 (data)Excel Screen ShotUncertainty Analysis for Engineers22

Excel Screen ShotUncertainty Analysis for Engineers23

Solver Set UpUncertainty Analysis for Engineers24

Matlab Scripty=xlsread('matlabfit.xlsx','normal')[s,t]=hist(y,8);s=s/((max(t)-min(t))/8)/numel(y);numpts=numel(t);zin(1)=mean(t); zin(2)=std(t);sumoferrs(zin,t,s)zout=fminsearch(@(z) sumoferrs(z,t,s), zin)sumoferrs(zout,t,s)xplot=t(1):(t(end)-t(1))/(10*numel(t)):t(end);yplot=curve(xplot,zout);plot(t,s,'+',xplot,yplot)Uncertainty Analysis for Engineers25Matlab Scriptfunction f=curve(x,z)mu=z(1);sig=z(2);f=normpdf(x,mu,sig); function f=sumoferrs(z, x, y)f=sum((curve(x,z)-y).^2);

Uncertainty Analysis for Engineers26Sampling DistributionsHow do we assess inaccuracy in using sample mean to estimate population mean?Uncertainty Analysis for Engineers27

ConclusionsExpected value of mean is equal to population meanMean of sample is unbiased estimator of mean of populationVariance of sample mean is sampling errorBy CLT, sample mean is Gaussian for large nMean of x is N(,/n)Estimator for improves as n increasesUncertainty Analysis for Engineers28Sample Mean with Unknown In previous derivation, is the population meanThis is generally not knownAll we have is the sample variance (s2)If sample size is small, distribution will not be GaussianWe can use a students t-distributionUncertainty Analysis for Engineers29

f=number of degrees of freedomDistribution of Sample VarianceUncertainty Analysis for Engineers30

ConclusionsSample variance is unbiased estimator of population variance

For normal variatesUncertainty Analysis for Engineers31

Chi-Square Distribution with n-1 dofThis approaches normal distribution for large nTesting HypothesesUsed to make decisions about population based on sampleStepsDefine null and alternative hypothesesIdentify test statisticEstimate test statistic, based on sampleSpecify level of significanceType I error: rejecting null hypothesis when it is trueType II error: accepting null hypothesis when it is falseDefine region of rejection (one tail or two?)Uncertainty Analysis for Engineers32Level of SignificanceType I errorLevel of significance ()Typically 1-5%Type II error () is seldom usedUncertainty Analysis for Engineers33ExampleWe need yield strength of rebar to be at least 38 psiWe order sample of 25 rebarsSample mean from 25 tests is 37.5 psiStandard deviation of rebar strength =3 psiUse one-sided testHypotheses: null-=38; alt.- 9Use Chi-Square distributionUncertainty Analysis for Engineers37SolutionUncertainty Analysis for Engineers38

So we reject the null hypothesis and the supplier is not acceptableConfidence IntervalsIn addition to mean, standard deviation, etc., confidence intervals can help us characterize populationsFor example, the mean gives us a best estimate of the expected value of the population, but confidence intervals can help indicate the accuracy of the meanConfidence interval is defined as the range within which a parameter will lie within a prescribed probabilityUncertainty Analysis for Engineers39CI of the MeanFirst, well assume the variance is knownThe central limit theorem states that the pdf of the mean of n individual observations from any distribution with finite mean and variance approaches a normal distribution as n approaches infinityUncertainty Analysis for Engineers40CI of the MeanUncertainty Analysis for Engineers41

Is CDF of standard normal variateExampleMeasure strength of rebar25 samplesMean=37.5 psiStandard deviation=3 psiFind 95% confidence interval for meanUncertainty Analysis for Engineers42SolutionUncertainty Analysis for Engineers43

So the mean of the strength falls between 36.3 and 38.7 with a 95% confidence level The Scriptmu=37.5sig=3n=25alpha=0.05ka=-norminv(1-alpha/2)k1ma=-kacil=mu+ka*sig/sqrt(n)ciu=mu-ka*sig/sqrt(n)

Uncertainty Analysis for Engineers44Variance Not KnownWhat if the variance of the population () is not known?That is, we only know variance of sample.Let s=standard deviation of sampleWe can show that

does not conform to a normal distribution, especially for small nUncertainty Analysis for Engineers45

Variance Not KnownWe can show that this quantity follows a Students t-distribution with n-1 degrees of freedom (f)Uncertainty Analysis for Engineers46

ExampleMeasure strength of rebar25 samplesMean=37.5 psis=3.5 psiFind 95% confidence interval for meanUncertainty Analysis for Engineers47ScriptResult is 36.06, 38.94

xbar=37.5;s=3.5;n=25;alpha=0.05;ka=-tinv(1-alpha/2,n-1);kb=-tinv(alpha/2,n-1);cil=xbar+ka*s/sqrt(n)ciu=xbar+kb*s/sqrt(n)

Uncertainty Analysis for Engineers48One-Sided Confidence LimitSometimes we only care about the upper or lower boundsLower

UpperUncertainty Analysis for Engineers49

Example100 steel specimens measure strengthMean=2200 kgf; s=220 kgfSpecify 95% confidence limit of mean

Assume =s=220 kgf1-=0.95; =0.05Uncertainty Analysis for Engineers50

Manufacturer has 95% confidence that yield strength is at least 2164 kgfExampleNow only 15 steel specimenMean=2200 kgf; s=220 kgfSpecify 95% confidence limit of meanUncertainty Analysis for Engineers51

Manufacturer has 95% confidence that yield strength is at least 2100 kgfConfidence Interval of VarianceUncertainty Analysis for Engineers52

Example25 storms, sample variance for measured runoff is 0.36 in2Find upper 95% confidence limit for variance

So, we can say, with 95% confidence, that the upper bound of the variance of the runoff is 0.624 in2 and the upper bound of the standard deviation is 0.79 inUncertainty Analysis for Engineers53

Scriptvar=0.36n=25alpha=0.05c=chi2inv(alpha,n-1)ci=1/c*var*(n-1)si=sqrt(ci)Uncertainty Analysis for Engineers54Measurement TheorySuppose we are measuring distancesd1, d2, , dn are measured distancesDistance estimate is

Standard error iss=standard deviation of sampled is the expected value of the meanUncertainty Analysis for Engineers55