est3 tutorial3mejorado

El modelo de regresión lineal simple

A continuación se ilustra cómo obtener el ajuste del modelo de regresión lineal simple con el paquete NCSS.

Los datos consisten de una calificación de aprovechamiento de Matemáticas de una muestra aleatoria de 10 estudiantes de primer ingreso a la universidad junto con la calificación de Cálculo. La meta es hacer inferencias sobre el modelo donde la calificación de Matemáticas (x) explique la calificación de Cálculo (y) y obtener predicciones de Cálculo para valores de Matemáticas de 50 y 60. Los datos se escriben directamente en NCSS como se muestra a continuación:

Matemáticas

Calculo

39 6543 7821 5264 8257 9247 8928 7375 9834 5652 75

Nota que a cada respuesta se le asigna su variable explicativa observada. Recuerda que la asignación de nombres de las columnas se puede hacer al oprimir el botón derecho del ratón y al escoger Variable Info y después Variable Name. No escribas acentos ni uses la ñ. También recuerda no cometer errores tipográficos; para esto, trata de jalar el cuadrito abajo a la derecha de la celda que quieras copiar.

1. Abre la ventana Linear Regression. En el menú, seleccciona Analysis, luego Regression/Correlation, y después Linear

Regression. El procedimiento de análisis de varianza aparecerá. En el menú, selecciona File, luego New Template. Esto hará que el procedimiento

sea el de “default”.

2. Especifica las variables. En la ventana Linear Regression, selecciona la ceja Variables. Especifica la caja Y: Dependent Variable(s) con la respuesta Cálculo haciendo

doble clic y seleccionándola. Especifica la caja X: Independent Variable(s) con la variable explicativa

Matematicas haciendo doble clic y seleccionándola.

3. Especifica los artículos del reporte. Especifica las predicciones para valores de X en la caja de Predict Y at these X

Values. Los valores se separan con un espacio. Para la ilustración teclea 50 60. Haz clic en la ceja Reports y selecciona las cantidades que deseas que aparezcan

en el análisis. Selecciona todos.

4. Corre el procedimiento. En el menú de Run, selecciona Run Procedure. Aquí obtendrás el siguiente

resultado:

Linear Regression ReportPage/Date/Time 1 22/10/2010 10:32:31 a.m.DatabaseY = Calculo X = Matematicas

Linear Regression Plot Section

50.0

62.5

75.0

87.5

100.0

20.0 35.0 50.0 65.0 80.0

Calculo vs Matematicas

Matematicas

Calc

ulo

Run Summary SectionParameter Value Parameter ValueDependent Variable Calculo Rows Processed 10Independent Variable Matematicas Rows Used in Estimation 10Frequency Variable None Rows with X Missing 0Weight Variable None Rows with Freq Missing 0Intercept 40.7842 Rows Prediction Only 0Slope 0.7656 Sum of Frequencies 10R-Squared 0.7052 Sum of Weights 10.0000Correlation 0.8398 Coefficient of Variation 0.1145Mean Square Error 75.75323 Square Root of MSE 8.703633

Linear Regression ReportPage/Date/Time 2 22/10/2010 10:32:31 a.m.Y = Calculo X = Matematicas

Summary StatementThe equation of the straight line relating Calculo and Matematicas is estimated as: (ecuación de regresion)Calculo = (40.7842) + (0.7656)*(Matematicas) using the 10 observations in this dataset. The y-intercept (βo), the estimated value of Calculo when Matematicas is zero, is 40.7842 with a standard error of 8.5069. The slope (pendiente = β1), the estimated change in Calculo per unit change in Matematicas, is 0.7656with a standard error of 0.1750. The value of R-Squared (r2), the proportion of the variation in Calculo that can be accounted for by variation in Matematicas, is 0.7052. The correlation between Calculo and Matematicas is 0.8398.

A significance test that the slope is zero resulted in a t-value of 4.3750. The significance level of this t-test is 0.0024. Since(entonces) 0.0024 < 0.0500, the hypothesis that the slope is zero is rejected. (Ho es rechazada)

The estimated slope(pendiente) is 0.7656. The lower limit of the 95% = 0.05 confidence interval for the slope is 0.3620 and the upper limit is 1.1691 (.3620, 1.1691) = intervalo de confianza para la pendiente β1. The estimated intercept is 40.7842. The lower limit of the 95% confidence interval for the intercept is 21.1673 and the upper limit is 60.4010(21.1673, 60.4010).= interval de confianza para la ordenada βo

Descriptive Statistics SectionParameter Dependent IndependentVariable Calculo MatematicasCount 10 10Mean 76.0000 46.0000Standard Deviation 15.1144 16.5798Minimum 52.0000 21.0000Maximum 98.0000 75.0000


Regression Estimation Section Estimación de la regression.Intercept Slope

Parameter B(0) B(1)Regression Coefficients 40.7842 0.7656Lower 95% Confidence Limit 21.1673 0.3620Upper 95% Confidence Limit 60.4010 1.1691Standard Error 8.5069 0.1750Standardized Coefficient 0.0000 0.8398

T Value 4.7943 4.3750Prob Level 0.0014 0.0024Reject H0 (Alpha = 0.0500) Yes Yes (prueba de hipot.)Power (Alpha = 0.0500) 0.9863 0.9677

Regression of Y on X 40.7842 0.7656Inverse Regression from X on Y 26.0655 1.0855Orthogonal Regression of Y and X 34.7968 0.8957

Notes:The above report shows the least-squares estimates of the intercept and slope followedby the corresponding standard errors, confidence intervals, and hypothesis tests. Notethat these results are based on several assumptions that should be validated before they are used.

Estimated Model( 40.784155214228) + ( .765561843168957) * (Matematicas)


Bootstrap Section (La idea básica del bootstrap es tratar la muestra como si fuera la población, es una estimación para una población, por supuesto esto intuye a que tenemos una cantidad de datos o individuos más grande, por ello el uso de 50 y 60)

------------ Estimation Results ------------ | ------------ Bootstrap Confidence Limits ----------------Parameter Estimate | Conf. Level Lower UpperInterceptOriginal Value 40.7842 | 0.9000 25.9293 54.0720Bootstrap Mean 40.5099 | 0.9500 22.4576 56.5937Bias (BM - OV) -0.2743 | 0.9900 15.6398 65.6676Bias Corrected 41.0584Standard Error 8.5692SlopeOriginal Value 0.7656 | 0.9000 0.4716 1.0306Bootstrap Mean 0.7761 | 0.9500 0.3845 1.0980Bias (BM - OV) 0.0105 | 0.9900 0.1654 1.2003Bias Corrected 0.7551Standard Error 0.1699CorrelationOriginal Value 0.8398 | 0.9000 0.7338 1.0000Bootstrap Mean 0.8263 | 0.9500 0.7189 1.0000Bias (BM - OV) -0.0135 | 0.9900 0.6959 1.0000Bias Corrected 0.8533Standard Error 0.0991R-SquaredOriginal Value 0.7052 | 0.9000 0.5160 1.0000Bootstrap Mean 0.6925 | 0.9500 0.4875 1.0000Bias (BM - OV) -0.0127 | 0.9900 0.4428 1.0000Bias Corrected 0.7179Standard Error 0.1496Standard Error of EstimateOriginal Value 8.7036 | 0.9000 7.5912 12.0945Bootstrap Mean 7.7916 | 0.9500 7.3206 12.7867Bias (BM - OV) -0.9121 | 0.9900 6.7540 14.5573Bias Corrected 9.6157Standard Error 1.3796Orthogonal InterceptOriginal Value 34.7968 | 0.9000 19.6108 52.2714Bootstrap Mean 33.6771 | 0.9500 14.9337 59.1995Bias (BM - OV) -1.1197 | 0.9900 9.5260 77.3036Bias Corrected 35.9165Standard Error 12.8285Orthogonal SlopeOriginal Value 0.8957 | 0.9000 0.4728 1.1779Bootstrap Mean 0.9269 | 0.9500 0.3089 1.2517Bias (BM - OV) 0.0312 | 0.9900 -0.0725 1.3568Bias Corrected 0.8646Standard Error 0.3270


Bootstrap Section

------------ Estimation Results ------------ | ------------ Bootstrap Confidence Limits ----------------Parameter Estimate | Conf. Level Lower UpperPredicted Mean and Confidence Limits of Calculo when Matematicas = 50.0000Original Value 79.0622 | 0.9000 74.2706 83.1174Bootstrap Mean 79.3135 | 0.9500 73.1039 83.8730Bias (BM - OV) 0.2512 | 0.9900 70.4090 85.4468Bias Corrected 78.8110Standard Error 2.7673Predicted Mean and Confidence Limits of Calculo when Matematicas = 60.0000Original Value 86.7179 | 0.9000 80.4961 91.1953Bootstrap Mean 87.0742 | 0.9500 78.2697 92.1576Bias (BM - OV) 0.3563 | 0.9900 73.7842 94.5974Bias Corrected 86.3616Standard Error 3.4398Predicted Value and Prediction Limits of Calculo when Matematicas = 50.0000Original Value 79.0622 | 0.9000 58.7435 97.2475Bootstrap Mean 78.7827 | 0.9500 55.4098 100.4693Bias (BM - OV) -0.2796 | 0.9900 47.9876 109.9048Bias Corrected 79.3418Standard Error 12.0660Predicted Value and Prediction Limits of Calculo when Matematicas = 60.0000Original Value 86.7179 | 0.9000 66.2959 105.5009Bootstrap Mean 86.6635 | 0.9500 61.8813 108.2358Bias (BM - OV) -0.0544 | 0.9900 55.0788 116.5996Bias Corrected 86.7723Standard Error 12.2026

Sampling Method = Observation, Confidence Limit Type = Reflection, Number of Samples = 3000.

Notes:The main purpose of this report is to present the bootstrap confidence intervals of variousparameters. All gross outliers should have been removed. The sample size should be at least (al menos)50 and the sample should be 'representative' of the population it was drawn from.


Correlation and R-Squared SectionSpearman

Pearson RankCorrelation Correlation

Parameter Coefficient R-Squared CoefficientEstimated Value 0.8398 0.7052 0.8788Lower 95% Conf. Limit (r dist'n) 0.4233Upper 95% Conf. Limit (r dist'n) 0.9540Lower 95% Conf. Limit (Fisher's z) 0.4460 0.5578Upper 95% Conf. Limit (Fisher's z) 0.9612 0.9711Adjusted (Rbar) 0.6684T-Value for H0: Rho = 0 4.3750 4.3750 5.2086Prob Level for H0: Rho = 0 0.0024 0.0024 0.0008

Notes:The confidence interval for the Pearson correlation assumes that X and Y follow the bivariate normal distribution. This is a different assumption from linear regression which assumes that X is fixed and Y is normally distributed.

Two confidence intervals are given. The first is based on the exact distribution of Pearson'scorrelation. The second is based on Fisher's z transformation which approximates the exact distribution using the normal distribution. Why are both provided? Because most books only mention Fisher's approximate method, it will often be needed to do homework. However,the exact methods should be used whenever possible.

The confidence limits can be used to test hypotheses about the correlation. To test thehypothesis that rho is a specific value, say r0, check to see if r0 is between the confidence limits. If it is, the null hypothesis that rho = r0 is not rejected. If r0 isoutside the limits, the null hypothesis is rejected.

Spearman's Rank correlation is calculated by replacing the orginal data with their ranks. This correlation is used when some of the assumptions may be invalid.


Analysis of Variance Section (Tabla ANOVA)Sum of Mean Prob Power

Source DF Squares Square F-Ratio Level (5%)Intercept 1 57760 57760Slope 1 1449.974 1449.974 19.1408 0.0024 0.9677Error 8 606.0259 75.75323Adj. Total 9 2056 228.4444Total 10 59816s = Square Root(75.75323) = 8.703633

Notes:The above report shows the F-Ratio for testing whether the slope is zero, the degrees of freedom, and the mean square error. The mean square error, which estimates the variance of the residuals,is used extensively in the calculation of hypothesis tests and confidence intervals.(El error cuadrado medio estima la varianza de los residuals es usado extensamente en el calculo de prueba de hipótesis e intervalos de confianza)

Summary MatricesX'X X'X X'Y X'X Inverse X'X Inverse

Index 0 1 2 0 10 10 460 760 0.9552951 -1.859337E-021 460 23634 36854 -1.859337E-02 4.042037E-042 (Y'Y) 59816Determinant 24740 4.042037E-05

Variance - Covariance Matrix of Regression CoefficientsVC(b) VC(b)

Index 0 10 72.36669 -1.4085081 -1.408508 3.061974E-02


Tests of Assumptions SectionIs the Assumption

Test Prob Reasonable at the 0.2000Assumption/Test Value Level Level of Significance?Residuals follow Normal Distribution?Shapiro Wilk 0.9162 0.326279 YesAnderson Darling 0.3987 0.365069 YesD'Agostino Skewness 0.5271 0.598118 YesD'Agostino Kurtosis -1.3574 0.174648 NoD'Agostino Omnibus 2.1204 0.346381 Yes

Constant Residual Variance?Modified Levene Test 0.0089 0.927267 Yes

Relationship is a Straight Line (Linea recta)?Lack of Linear Fit (Falta de ajuste lineal) F(0, 0) Test0.0000 0.000000 No

No Serial Correlation?Evaluate the Serial-Correlation report and the Durbin-Watson test if you have equal-spaced, time series data.

Notes:A 'Yes' means there is not enough evidence to make this assumption seem unreasonable (un “yes” quiere decir que no hay suficiente evidencia para hacer que la suposición parezca razonable).This lack of evidence may be because the sample size is too small, the assumptions of the test itself are not met, or the assumption is valid (esta falta de evidencia puede ser debido a que el tamaño de la muestra es muy pequeño, los supuestos de la prueba por si mismos, no se cumplen, o el supuesto es valido..A 'No' means the that the assumption is not reasonable (un "No " significa que el que la suposición no es razonable). However, since these tests are related to sample size, you should assess the role of sample size in the tests by also evaluating the appropriate plots and graphs. A large dataset (say N > 500) willoften fail at least one of the normality tests because it is hard to find a large dataset thatis perfectly normal.

Normality and Constant Residual Variance:Possible remedies for the failure of these assumptions include using a transformation of Ysuch as the log or square root, correcting data-recording errors found by looking into outliers,adding additional independent variables, using robust regression, or using bootstrap methods.

Straight-Line:Possible remedies for the failure of this assumption include using nonlinear regression orpolynomial regression.


Serial Correlation of Residuals SectionSerial Serial Serial

Lag Correlation Lag Correlation Lag Correlation1 0.3611 9 172 -0.2875 10 183 -0.4307 11 194 -0.3158 12 205 13 216 14 227 15 238 16 24

Notes:Each serial correlation is the Pearson correlation calculated between the original series of residuals and the residuals lagged the specified number of periods. This feature of residuals is only meaning full for data obtained sorted in time order. One of the assumptions is that noneof these serial correlations is significant. Starred correlations are those for which |Fisher's Z| > 1.645 which indicates whether the serial correlation is 'large.'

If serial correlation is detected in time series data, the remedy is to account for it eitherby replacing Y with first differences or by fitting the serial pattern using a method such as that proposed by Cochrane and Orcutt.

Durbin-Watson Test For Serial CorrelationDid the Test Reject

Parameter Value H0: Rho(1) = 0?Durbin-Watson Value 1.1737Prob. Level: Positive Serial Correlation 0.1078 YesProb. Level: Negative Serial Correlation 0.9474 No

Notes:The Durbin-Watson test was created to test for first-order serial correlationin regression data taken over time. If the rows of your dataset do not representsuccessive time periods, you should ignore this test.

This report gives the probability of rejecting the null hypothesis of no first-orderserial correlation. Possible remedies for serial correlation were given in the Notesto the Serial Correlation report, above.


PRESS SectionFrom From

PRESS RegularParameter Residuals ResidualsSum of Squared Residuals 888.5684 606.0259Sum of |Residuals| 84.87208 69.78011R-Squared 0.5678 0.7052

Notes:A PRESS residual is found by estimating the regression equation without the observation, predicting the dependent variable, and subtracting the predicted value from the actual value.The PRESS values are calculated from these PRESS residuals. The Regular values are the corresponding calculations based on the regular residuals.

The PRESS values are often used to compare models in a multiple-regression variable selection.They show how well the model predicts observations that were not used in the estimation.

Predicted Values and Confidence Limits SectionPredicted Standard Lower 95% Upper 95%

Matematicas Calculo Error Confidence Confidence(X) (Yhat|X) of Yhat Limit of Y|X Limit of Y|X50.0000 79.0622 2.8399 72.5133 85.611260.0000 86.7179 3.6847 78.2210 95.2147

The confidence interval estimates the mean of the Y values in a large sample of individuals with this value of X. The interval is only accurate if all of the linear regression assumptions are valid.

Predicted Values and Prediction Limits SectionPredicted Standard Lower 95% Upper 95%

Matematicas Calculo Error Prediction Prediction(X) (Yhat|X) of Yhat Limit of Y|X Limit of Y|X50.0000 79.0622 9.1552 57.9502 100.174360.0000 86.7179 9.4515 64.9228 108.5130

The prediction interval estimates the predicted value of Y for a single individual with this value of X. The interval is only accurate if all of the linear regression assumptions are valid.


Residual Plots Section

-15.0

-7.5

0.0

7.5

15.0

20.0 35.0 50.0 65.0 80.0

Residuals of Calculo vs Matematicas

Matematicas

Resid

uals

of

Calc

ulo

0.0

3.5

7.0

10.5

14.0

20.0 35.0 50.0 65.0 80.0

|Residuals of Calculo| vs Matematicas

Matematicas

|Resid

uals

of

Calc

ulo

|

-1.5

-0.6

0.3

1.1

2.0

20.0 35.0 50.0 65.0 80.0

RStudent of Calculo vs Matematicas

Matematicas

RS

tudent

of

Calc

ulo

-15.0

-7.5

0.0

7.5

15.0

0.0 3.0 6.0 9.0 12.0

Residuals Calculo vs Row

Row

Resid

uals

of

Calc

ulo

-15.0

-7.5

0.0

7.5

15.0

-15.0 -7.5 0.0 7.5 15.0

Serial Correlation of Residuals

Lagged Residuals of Calculo

Resid

uals

of

Calc

ulo

0.0

1.3

2.5

3.8

5.0

-15.0 -7.5 0.0 7.5 15.0

Histogram of Residuals of Calculo

Residuals of Calculo

Count


-15.0

-7.5

0.0

7.5

15.0

-2.0 -1.0 0.0 1.0 2.0

Normal Probability Plot of Residuals of Calculo

Expected Normals

Resid

uals

of

Calc

ulo

Original Data SectionPredicted

Matematicas Calculo Calculo

Row (X) (Y) (Yhat|X)(̂Y ) Residual

1 39.0000 65.0000 70.6411 -5.64112 43.0000 78.0000 73.7033 4.29673 21.0000 52.0000 56.8610 -4.86104 64.0000 82.0000 89.7801 -7.78015 57.0000 92.0000 84.4212 7.57886 47.0000 89.0000 76.7656 12.23447 28.0000 73.0000 62.2199 10.78018 75.0000 98.0000 98.2013 -0.20139 34.0000 56.0000 66.8133 -10.813310 52.0000 75.0000 80.5934 -5.5934

This report provides a data list that may be used to verify whether the correct variables were selected.


Predicted Values and Confidence Limits of MeansPredicted Standard Lower 95% Upper 95%

Matematicas Calculo Calculo Error Conf. Limit Conf. LimitRow (X) (Y) (Yhat|X) of Yhat of Y Mean|X of Y Mean|X1 39.0000 65.0000 70.6411 3.0126 63.6940 77.58812 43.0000 78.0000 73.7033 2.8019 67.2420 80.16463 21.0000 52.0000 56.8610 5.1684 44.9425 68.77944 64.0000 82.0000 89.7801 4.1828 80.1345 99.42585 57.0000 92.0000 84.4212 3.3586 76.6762 92.16626 47.0000 89.0000 76.7656 2.7579 70.4059 83.12537 28.0000 73.0000 62.2199 4.1828 52.5742 71.86558 75.0000 98.0000 98.2013 5.7729 84.8889 111.51379 34.0000 56.0000 66.8133 3.4619 58.8302 74.796410 52.0000 75.0000 80.5934 2.9458 73.8004 87.3864

The confidence interval estimates the mean of the Y values in a large sample of individuals with this value of X. The interval is only accurate if all of the linear regression assumptions are valid.

Predicted Values and Prediction LimitsPredicted Standard Lower 95% Upper 95%

Matematicas Calculo Calculo Error Prediction PredictionRow (X) (Y) (Yhat|X) of Yhat Limit of Y|X Limit of Y|X1 39.0000 65.0000 70.6411 9.2103 49.4022 91.88002 43.0000 78.0000 73.7033 9.1435 52.6183 94.78833 21.0000 52.0000 56.8610 10.1225 33.5183 80.20364 64.0000 82.0000 89.7801 9.6566 67.5120 112.04825 57.0000 92.0000 84.4212 9.3292 62.9081 105.93436 47.0000 89.0000 76.7656 9.1301 55.7115 97.81977 28.0000 73.0000 62.2199 9.6566 39.9518 84.48808 75.0000 98.0000 98.2013 10.4441 74.1171 122.28559 34.0000 56.0000 66.8133 9.3668 45.2133 88.413210 52.0000 75.0000 80.5934 9.1886 59.4044 101.7824

The prediction interval estimates the predicted value of Y for a single individual with this value of X. The interval is only accurate if all of the linear regression assumptions are valid. Los intervalos de predicción estiman la predicción del valor de Y para un único valor de x (predicción puntual).


Working-Hotelling Simultaneous Confidence BandPredicted Standard Lower 95% Upper 95%

Matematicas Calculo Calculo Error Conf. Band Conf. BandRow (X) (Y) (Yhat|X) of Yhat of Y Mean|X of Y Mean|X1 39.0000 65.0000 70.6411 3.0126 43.7750 97.50722 43.0000 78.0000 73.7033 2.8019 48.7157 98.69093 21.0000 52.0000 56.8610 5.1684 10.7692 102.95274 64.0000 82.0000 89.7801 4.1828 52.4778 127.08245 57.0000 92.0000 84.4212 3.3586 54.4692 114.37316 47.0000 89.0000 76.7656 2.7579 52.1709 101.36027 28.0000 73.0000 62.2199 4.1828 24.9176 99.52228 75.0000 98.0000 98.2013 5.7729 46.7188 149.68389 34.0000 56.0000 66.8133 3.4619 35.9405 97.686010 52.0000 75.0000 80.5934 2.9458 54.3231 106.8637

This is a confidence band for the regression line for all possible values of X from -infinity to + infinity. The confidence coefficient is the proportion of time that this procedure yields a band the includes the trueregression line when a large number of samples are taken using the same X values as in this sample.

Residual SectionPredicted Percent

Matematicas Calculo Calculo Standardized AbsoluteRow (X) (Y) (Yhat|X) Residual Residual Error1 39.0000 65.0000 70.6411 -5.6411 -0.6908 8.67862 43.0000 78.0000 73.7033 4.2967 0.5214 5.50863 21.0000 52.0000 56.8610 -4.8610 -0.6941 9.34804 64.0000 82.0000 89.7801 -7.7801 -1.0193 9.48795 57.0000 92.0000 84.4212 7.5788 0.9439 8.23786 47.0000 89.0000 76.7656 12.2344 1.4820 13.74667 28.0000 73.0000 62.2199 10.7801 1.4124 14.76738 75.0000 98.0000 98.2013 -0.2013 -0.0309 0.20549 34.0000 56.0000 66.8133 -10.8133 -1.3541 19.309410 52.0000 75.0000 80.5934 -5.5934 -0.6830 7.4578

The residual is the difference between the actual and the predicted Y values. The formula isResidual = Y - Yhat. The Percent Absolute Error is the 100 |Residual| / Y.


Residual Diagnostics SectionMatematicas Hat

Row (X) Residual RStudent Diagonal Cook's D MSEi1 39.0000 -5.6411 -0.6664 0.1198 0.0325 81.41042 43.0000 4.2967 0.4963 0.1036 0.0157 83.63283 21.0000 -4.8610 -0.6698 0.3526 0.1312 81.36094 64.0000 -7.7801 -1.0222 0.2310 0.1560 75.33105 57.0000 7.5788 0.9366 0.1489 0.0779 76.93406 47.0000 12.2344 1.6277 0.1004 0.1226 62.80557 28.0000 10.7801 1.5249 0.2310 0.2995 64.98778 75.0000 -0.2013 *-0.0289 0.4399 0.0004 86.56489 34.0000 -10.8133 -1.4427 0.1582 0.1723 66.732110 52.0000 -5.5934 -0.6583 0.1146 0.0302 81.5275

Outliers are rows that are separated from the rest of the data. Influential rows are those whose omission results in a relatively large change in the results. This report lets you see both.

An outlier may be defined as a row in which |RStudent| > 2. A moderately influential row is one witha CooksD > 0.5. A heavily influential row is one with a CooksD > 1.

MSEi is the value of the Mean Square Error (the average of the sum of squared residuals) calculated with each row omitted.

Leave One Row Out SectionRow RStudent DFFITS Cook's D CovRatio DFBETAS(0) DFBETAS(1)1 -0.6664 -0.2459 0.0325 1.3121 -0.1673 0.10002 0.4963 0.1687 0.0157 1.3598 0.0835 -0.03163 -0.6698 -0.4943 0.1312 * 1.7819 -0.4811 0.41844 -1.0222 -0.5602 0.1560 1.2859 0.2799 -0.42185 0.9366 0.3918 0.0779 1.2119 -0.1086 0.22456 1.6277 0.5438 0.1226 0.7641 0.1429 0.03457 1.5249 0.8357 0.2995 0.9570 0.7733 -0.62938 -0.0289 -0.0256 0.0004 * 2.3315 0.0174 -0.02259 -1.4427 -0.6255 0.1723 0.9219 -0.5199 0.379410 -0.6583 -0.2368 0.0302 1.3081 0.0083 -0.0844

Each column gives the impact on some aspect of the linear regression of omitting that row.

RStudent represents the size of the residual. DFFITS represents the change in the fitted value of a row.Cook's D summarizes the change in the fitted values of all rows. CovRatio represents the amount of change in the determinant of the covariance matrix. DFBETAS(0) and DFBETAS(1) give the amount of change in the intercept and slope.


Outlier Detection Chart

Matematicas StandardizedRow (X) Residual Residual RStudent1 39.0000 -5.6411 |||............ -0.6908 ||............. -0.6664 |..............2 43.0000 4.2967 ||............. 0.5214 ||............. 0.4963 |..............3 21.0000 -4.8610 |||............ -0.6941 ||............. -0.6698 |..............4 64.0000 -7.7801 |||||.......... -1.0193 ||||........... -1.0222 |..............5 57.0000 7.5788 ||||........... 0.9439 |||............ 0.9366 |..............6 47.0000 12.2344 |||||||........ 1.4820 ||||||......... 1.6277 |..............7 28.0000 10.7801 ||||||......... 1.4124 |||||.......... 1.5249 |..............8 75.0000 -0.2013 |.............. -0.0309 |.............. -0.0289 |..............9 34.0000 -10.8133 ||||||......... -1.3541 |||||.......... -1.4427 |..............10 52.0000 -5.5934 |||............ -0.6830 ||............. -0.6583 |..............

Outliers are rows that are separated from the rest of the data. Since outliers can have dramatic effectson the results, corrective action, such as elimination, must be carefully considered. Outlying rows shouldnot be automatically be removed unless a good reason for their removal can be given.

An outlier may be defined as a row in which |RStudent| > 2. Rows with this characteristic have been starred.

Influence Detection ChartMatematicas

Row (X) DFFITS Cook's D DFBETAS(1)1 39.0000 -0.2459 |.............. 0.0325 |.............. 0.1000 |..............2 43.0000 0.1687 |.............. 0.0157 |.............. -0.0316 |..............3 21.0000 -0.4943 |.............. 0.1312 |.............. 0.4184 |..............4 64.0000 -0.5602 |.............. 0.1560 |.............. -0.4218 |..............5 57.0000 0.3918 |.............. 0.0779 |.............. 0.2245 |..............6 47.0000 0.5438 |.............. 0.1226 |.............. 0.0345 |..............7 28.0000 0.8357 |.............. 0.2995 ||............. -0.6293 |..............8 75.0000 -0.0256 |.............. 0.0004 |.............. -0.0225 |..............9 34.0000 -0.6255 |.............. 0.1723 |.............. 0.3794 |..............10 52.0000 -0.2368 |.............. 0.0302 |.............. -0.0844 |..............

Influential rows are those whose omission results in a relatively large change in the results. They are not necessarily harmful. However, they will distort the results if they are also outliers. The impact ofinfluential rows should be studied very carefully. Their accuracy should be double-checked.DFFITS is the standardized change in Yhat when the row is omitted. A row is influential whenDFFITS > 1 for small datasets (N < 30) or when DFFITS > 2*SQR(1/N) for medium to large datasets.

Cook's D gives the influence of each row on the Yhats of all the rows. Cook suggests investigatingall rows having a Cook's D > 0.5. Rows in which Cook's D > 1.0 are very influential.

DFBETAS(1) is the standardized change in the slope when this row is omitted. DFBETAS(1) > 1 for small datasets (N < 30) and DFBETAS(1) > 2/SQR(N) for medium and large datasets are indicative of influential rows.


Outlier & Influence ChartHat

Matematicas RStudent Cooks D DiagonalRow (X) (Outlier) (Influence) (Leverage)1 39.0000 -0.6664 |.............. 0.0325 |.............. 0.1198 |..............2 43.0000 0.4963 |.............. 0.0157 |.............. 0.1036 |..............3 21.0000 -0.6698 |.............. 0.1312 |.............. 0.3526 |||||||||||....4 64.0000 -1.0222 |.............. 0.1560 |.............. 0.2310 |||||..........5 57.0000 0.9366 |.............. 0.0779 |.............. 0.1489 ||.............6 47.0000 1.6277 |.............. 0.1226 |.............. 0.1004 |..............7 28.0000 1.5249 |.............. 0.2995 ||............. 0.2310 |||||..........8 75.0000 -0.0289 |.............. 0.0004 |.............. 0.4399 |||||||||||||||9 34.0000 -1.4427 |.............. 0.1723 |.............. 0.1582 ||.............10 52.0000 -0.6583 |.............. 0.0302 |.............. 0.1146 |..............

Outliers are rows that are separated from the rest of the data. Influential rows are those whose omission results in a relatively large change in the results. This report lets you see both.

An outlier may be defined as a row in which |RStudent| > 2. A moderately influential row is one witha CooksD > 0.5. A heavily influential row is one with a CooksD > 1.

Inverse Prediction of X MeansPredicted Lower 95% Upper 95%

Calculo Matematicas Matematicas Conf. Limit Conf. LimitRow (Y) (X) (Xhat|Y) X-Xhat|Y of X Mean|Y of X Mean|Y1 65.0000 39.0000 31.6315 7.3685 11.7810 40.42702 78.0000 43.0000 48.6125 -5.6125 39.6772 59.55773 52.0000 21.0000 14.6505 6.3495 -22.2829 27.46404 82.0000 64.0000 53.8374 10.1626 45.5434 68.16135 92.0000 57.0000 66.8997 -9.8997 56.8331 93.04626 89.0000 47.0000 62.9810 -15.9810 53.7409 85.28607 73.0000 28.0000 42.0813 -14.0813 30.4075 50.74018 98.0000 75.0000 74.7371 0.2629 62.6604 108.92379 56.0000 34.0000 19.8754 14.1246 -11.5925 31.243310 75.0000 52.0000 44.6938 7.3062 34.3891 53.9934

This confidence interval estimates the mean of X in a large sample of individuals with this value of Y. This method of inverse prediction is also called 'calibration.'


Inverse Prediction of X IndividualsPredicted Lower 95% Upper 95%

Calculo Matematicas Matematicas Prediction PredictionRow (Y) (X) (Xhat|Y) X-Xhat|Y Limit of X|Y Limit of X|Y1 65.0000 39.0000 31.6315 7.3685 -7.9089 60.11692 78.0000 43.0000 48.6125 -5.6125 17.2054 82.02953 52.0000 21.0000 14.6505 6.3495 -37.0380 42.21914 82.0000 64.0000 53.8374 10.1626 23.9947 89.71005 92.0000 57.0000 66.8997 -9.8997 39.1685 110.71086 89.0000 47.0000 62.9810 -15.9810 34.8652 104.16187 73.0000 28.0000 42.0813 -14.0813 8.0918 73.05598 98.0000 75.0000 74.7371 0.2629 47.2329 124.35119 56.0000 34.0000 19.8754 14.1246 -27.7306 47.381510 75.0000 52.0000 44.6938 7.3062 11.8213 76.5612

This prediction interval estimates the predicted value of X for a single individual with this value of Y. This method of inverse prediction is also called 'calibration.'

est3 tutorial3mejorado

Education