simple and multiple regression. relació entre variables entre variables discretes (exemple: veure...
TRANSCRIPT
SIMPLE AND MULTIPLE REGRESSION
Relació entre variables
•Entre variables discretes
(exemple: veure Titanic)
•Entre continues (Regressió !)
•Entre discretes i continues (Regressió!)
> Rendiment en Matemàtiques,
> Nombre de llibres a casa
REGR factor score 1 for analysis 1
1200
1000
800
600
400
200
0
Desv. tνp. = 1,00
Media = 0,00
N = 10791,00
How many books at home Q19
6,05,04,03,02,01,0
4000
3000
2000
1000
0
Desv. tνp. = 1,30
Media = 3,8
N = 10670,00
Pisa 2003
> Rendiment en Matemàtiques,
> Nombre de llibres a casa
Pisa 2003
How many books at home Q19
76543210
RE
GR
fact
or s
core
1
for
anal
ysis
1
4
2
0
-2
-4
-6
Pisa 2003
1061512661927237235751155375N =
How many books at home Q19
Ren
dim
ent M
at
4
2
0
-2
-4
-6
Regressió Lineal ?
Pisa 2003 Regressió Lineal ?
Informe
REGR factor score 1 for analysis 1
-1,05056 375 ,99645420
-,6346741 1155 ,93700815
-,1713318 3575 ,91639840
,1350648 2372 ,86959580
,4411666 1927 ,86447655
,5051664 1266 ,96605461
,0066095 10670 ,99317157
How many books athome Q190-10 books
11-25 books
26-100 books
101-200 books
201-500 books
More than 500 books
Total
Media N Desv. típ.
Pisa 2003 Regressió Lineal ?
Regressió Lineal
ANOVAb
1628,360 1 1628,360 1952,838 ,000a
8895,433 10668 ,834
10523,792 10669
Regresión
Residual
Total
Modelo1
Suma decuadrados gl
Mediacuadrática F Sig.
Variables predictoras: (Constante), How many books at home Q19a.
Variable dependiente: REGR factor score 1 for analysis 1b.
Coeficientesa
-1,126 ,027 -41,532 ,000
,301 ,007 ,393 44,191 ,000
(Constante)
How many booksat home Q19
Modelo1
B Error típ.
Coeficientes noestandarizados
Beta
Coeficientesestandarizad
os
t Sig.
Variable dependiente: REGR factor score 1 for analysis 1a.
* We first load the PISAespanya.sav file and then* this is the sintaxis file for SPSS analysis
*Q38 How often do these things happen in your math class*Student dont't listen to what the teacher says
CROSSTABS /TABLES=subnatio BY st38q02 /FORMAT= AVALUE TABLES /STATISTIC=CHISQ /CELLS= COUNT ROW .
FACTOR /VARIABLES pv1math pv2math pv3math pv4math pv5math pv1math1 pv2math1 pv3math1 pv4math1 pv5math1 pv1math2 pv2math2 pv3math2 pv4math2 pv5math2 pv1math3 pv2math3 pv3math3 pv4math3 pv5math3 pv1math4 pv2math4 pv3math4 pv4math4 pv5math4 /MISSING LISTWISE /ANALYSIS pv1math pv2math pv3math pv4math pv5math pv1math1 pv2math1 pv3math1 pv4math1 pv5math1 pv1math2 pv2math2 pv3math2 pv4math2 pv5math2 pv1math3 pv2math3 pv3math3 pv4math3 pv5math3 pv1math4 pv2math4 pv3math4 pv4math4 pv5math4 /PRINT INITIAL EXTRACTION FSCORE /PLOT EIGEN ROTATION /CRITERIA FACTORS(1) ITERATE(25) /EXTRACTION ML /ROTATION NOROTATE /SAVE REG(ALL) .
GRAPH /SCATTERPLOT(BIVAR)=st19q01 WITH fac1_1 /MISSING=LISTWISE .
REGRESSION /MISSING LISTWISE /STATISTICS COEFF OUTS R ANOVA /CRITERIA=PIN(.05) POUT(.10) /NOORIGIN /DEPENDENT fac1_1 /METHOD=ENTER st19q01 /PARTIALPLOT ALL /SCATTERPLOT=(*ZRESID ,*ZPRED ) /RESIDUALS HIST(ZRESID) NORM(ZRESID) .
library(foreign)help(read.spss)data=read.spss("G:/DATA/PISAdata2003/ReducedDataSpain.sav", use.value.labels=TRUE,to.data.fram=TRUE)names(data) [1] "SUBNATIO" "SCHOOLID" "ST03Q01" "ST19Q01" "ST26Q04" "ST26Q05" [7] "ST27Q01" "ST27Q02" "ST27Q03" "ST30Q02" "EC07Q01" "EC07Q02" [13] "EC07Q03" "EC08Q01" "IC01Q01" "IC01Q02" "IC01Q03" "IC02Q01" [19] "IC03Q01" "MISCED" "FISCED" "HISCED" "PARED" "PCMATH" [25] "RMHMWK" "CULTPOSS" "HEDRES" "HOMEPOS" "ATSCHL" "STUREL" [31] "BELONG" "INTMAT" "INSTMOT" "MATHEFF" "ANXMAT" "MEMOR" [37] "COMPLRN" "COOPLRN" "TEACHSUP" "ESCS" "W.FSTUWT" "OECD" [43] "UH" "FAC1.1" attach(data)mean(FAC1.1)[1] -8.95814e-16
tabulate(ST19Q01)[1] 106 0 15 1266 1927 2372 3575 1155 375> table(ST19Q01)ST19Q01 Miss Invalid N/A More than 500 books 106 0 15 1266 201-500 books 101-200 books 26-100 books 11-25 books 1927 2372 3575 1155 0-10 books 375
Efecto de Cultural Possession of the family
Gráfico de regresión parcial
Variable dependiente: REGR factor score 1 for analysis 1
Cultural possessions of the family (WLE)
3210-1-2-3
RE
GR
fact
or s
core
1
for
anal
ysis
1
4
3
2
1
0
-1
-2
-3
-4
Data
Variables Y and X observedon a sample of size n:
yi , xi i =1,2, ..., n
Covariance and correlation
Scatterplot for various values of correlation
Coeficient de correlació r = 0 , tot i que hi ha una relació funcional exacta (no lineal!)
> cbind(x,y) x y [1,] -10 100 [2,] -9 81 [3,] -8 64 [4,] -7 49 [5,] -6 36 [6,] -5 25 [7,] -4 16 [8,] -3 9 [9,] -2 4[10,] -1 1[11,] 0 0[12,] 1 1[13,] 2 4[14,] 3 9[15,] 4 16[16,] 5 25[17,] 6 36[18,] 7 49[19,] 8 64[20,] 9 81[21,] 10 100>
Regressió Lineal Simple
Variables Y X
E Y | X = + X
Var (Y | X ) =
Linear relation: y = 1 + .6 X
Linear relation and sample data
Regression Model
Yi = + Xi + i
i : mean zero variance 2 nrmally distributed
Sample Data: Scatterplot
Fitted regression line
a= 0.5789 b=0.6270
Fitted and true regression lines:a= 0.5789 b=0.6270 =1, =.6
Fitted and true regression lines in repeated (20) sampling
=1, =.6
OLS estimate of beta (under repeated sampling)
=1, =.6
Estimate of beta for different samples (100):
0.619 0.575 0.636 0.543 0.555 0.594 0.611 0.584 0.576 ......
>
> mean(bs)
[1] 0.6042086
> sd(bs)
[1] 0.03599894
>
REGRESSION Analysis of the Simulated Data
(with R and other software )
Fitted regression line:
a= 1.0232203, b= 0.6436286=1, =.6
Regression Analysisregression = lm(Y ~X)summary(regression)
Call:lm(formula = Y ~ X)
Residuals: Min 1Q Median 3Q Max -6.0860 -2.1429 -0.1863 1.9695 9.4817
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.0232 0.3188 3.21 0.00180 ** X 0.6436 0.0377 17.07 < 2e-16 ***---Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
Residual standard error: 3.182 on 98 degrees of freedomMultiple R-Squared: 0.7483, Adjusted R-squared: 0.7458 F-statistic: 291.4 on 1 and 98 DF, p-value: < 2.2e-16
>>
Regression Analysis with Stata
. use "E:\Albert\COURSES\cursDAS\AS2003\data\MONT.dta", clear
. regress y x
Source | SS df MS Number of obs = 100---------+------------------------------ F( 1, 98) = 291.42 Model | 2950.73479 1 2950.73479 Prob > F = 0.0000Residual | 992.280727 98 10.1253135 R-squared = 0.7483---------+------------------------------ Adj R-squared = 0.7458 Total | 3943.01551 99 39.8284395 Root MSE = 3.182
------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval]---------+-------------------------------------------------------------------- x | .6436286 .0377029 17.071 0.000 .5688085 .7184488 _cons | 1.02322 .3187931 3.210 0.002 .3905858 1.655855------------------------------------------------------------------------------
. predict yh
. graph yh y x, c(s.) s(io)
Regression analysis with SPSS
Resumen del modelo
,865a ,748 ,746 3,1820Modelo1
R R cuadradoR cuadradocorregida
Error típ. de laestimación
Variables predictoras: (Constante), Xa.
ANOVAb
2950,735 1 2950,735 291,422 ,000a
992,281 98 10,125
3943,016 99
Regresión
Residual
Total
Modelo1
Suma decuadrados gl
Mediacuadrática F Sig.
Variables predictoras: (Constante), Xa.
Variable dependiente: Yb.
Estimación
Coeficientesa
1,023 ,319 3,210 ,002
,644 ,038 ,865 17,071 ,000
(Constante)
X
Modelo1
B Error típ.
Coeficientes noestandarizados
Beta
Coeficientes
estandarizados
t Sig.
Variable dependiente: Ya.
Gráfico de dispersión
X
20100-10-20
Y20
10
0
-10
-20
Fitted Regression
FYi = 1.02 + .64 Xi , R2=.74s.e.: (.037) t-value: 17.07
Regression coeficient of X is significant (5% significance level),with the expected value of Y icreasing .64 for each unit increase of X. The 95% confidence interval for the regression coefficient is
[.64-1.96*.037, . .64+1.96*.037]=[.57, .71]
74% of the variation of Y is explained by the variation of X
Fitted regression line
Residual plot
OLS analysis
Variance decomposition
Properties of OLS estimation
Sampling distributions
Inferences
Student-t distribution
Significance tests
F-test
Prediction of Y
Multiple Regression:
t-test and CI
F-test
Confidence bounds for Y
Interpreting multiple regression by means of simple regression Interpreting multiple regression by means of simple regression
Adjusted R2
Exemple de l’Anàlisi de Regressió
Dades de paisos.sav
Variables
Matrix Plot
Esperanηa de vida (1
Calories diΰries / h
PIB per cΰpita en $
Habitants per metge
Necessitat de transformar les variables !
Transformació de variables
Matrix Plot
Relacions (aproximadament) lineals entre variables transformades !
Esperanηa de vida (1
Calories diΰries / h
LOGPIB
LOGMETG
Anàlisi de RegressióResumen del modelob
,897a ,805 ,800 4,7680Modelo1
R R cuadradoR cuadradocorregida
Error típ. de laestimación
Variables predictoras: (Constante), LOGMETG, Caloriesdiàries / habitant, LOGPIB
a.
Variable dependiente: Esperança de vida (1992)b.
REGRESSION /MISSING LISTWISE /STATISTICS COEFF OUTS R ANOVA /CRITERIA=PIN(.05) POUT(.10) /NOORIGIN /DEPENDENT espvida /METHOD=ENTER calories logpib logmetg /PARTIALPLOT ALL /SCATTERPLOT=(*ZRESID ,*ZPRED ) /SAVE RESID .
Coeficientesa
73,230 8,294 8,830 ,000
,000 ,001 ,007 ,094 ,925
6,184 1,158 ,401 5,341 ,000
-8,586 1,287 -,534 -6,671 ,000
(Constante)
Calories diàries / habitant
LOGPIB
LOGMETG
Modelo1
B Error típ.
Coeficientes noestandarizados
Beta
Coeficientesestandarizad
os
t Sig.
Variable dependiente: Esperança de vida (1992)a.
Diagnósticos por casoa
3,035 71,9Número de caso37
Residuo tip.Esperança de
vida (1992)
Variable dependiente: Esperança de vida (1992)a.
Residus vs y ajustada:
Grαfico de dispersiσn
Variable dependiente: Esperanηa de vida (1992)
Regresiσn Valor pronosticado tipificado
210-1-2
Reg
resi
σn
Res
iduo
tipi
ficad
o
4
3
2
1
0
-1
-2
-3
138
136
135
134133
131130129128
127126125
124123
122
121
119118
116
115114113112
111
110
108
107
106
105
102
100
99
98
97
9695
94
93
91
88
87
86
85
8382
81
79
78
7675
74
7372
71
70
69
68
67
66
65
64
62
61 60
59
58
57
56
555453
52
5150
49
48
47
46
44
43
42
41
40
39
38
37
36
35
34
3332
31
30
29
28
27
26
25
24
22
21
20
19
18
17
16
15
1413
12
11
10
9
8
7
6
4
3
2
1
Regressió parcial: Grαfico de regresiσn parcial
Variable dependiente: Esperanηa de vida (1992)
Calories diΰries / habitant
800
600
400
200
0
-200
-400
-600
-800
-1000
Esp
eran
ηa d
e vi
da (
1992
)
20
10
0
-10
-20
Regressió parcial:
Grαfico de regresiσn parcial
Variable dependiente: Esperanηa de vida (1992)
LOGPIB
1,0,50,0-,5-1,0-1,5
Esp
eran
ηa d
e vi
da (
1992
)
20
10
0
-10
-20
Regressió parcial Grαfico de regresiσn parcial
Variable dependiente: Esperanηa de vida (1992)
LOGMETG
,8,6,4,2,0-,2-,4-,6-,8-1,0
Esp
eran
ηa d
e vi
da (
1992
)
20
10
0
-10
-20
Anàlisi de RegressióREGRESSION /MISSING LISTWISE /STATISTICS COEFF OUTS R ANOVA /CRITERIA=PIN(.05) POUT(.10) /NOORIGIN /DEPENDENT espvida /METHOD=ENTER calories logpib logmetg cal2 /PARTIALPLOT ALL /SCATTERPLOT=(*ZRESID ,*ZPRED ) /SAVE RESID .
Resumen del modelob
,903a ,816 ,809 4,6550Modelo1
R R cuadradoR cuadradocorregida
Error típ. de laestimación
Variables predictoras: (Constante), CAL2, LOGPIB,LOGMETG, Calories diàries / habitant
a.
Variable dependiente: Esperança de vida (1992)b.
Coeficientesa
47,394 12,852 3,688 ,000
,019 ,007 1,027 2,564 ,012
5,951 1,134 ,386 5,248 ,000
-8,342 1,260 -,519 -6,620 ,000
-3,34E-06 ,000 -1,002 -2,589 ,011
(Constante)
Calories diàries / habitant
LOGPIB
LOGMETG
CAL2
Modelo1
B Error típ.
Coeficientes noestandarizados
Beta
Coeficientesestandarizad
os
t Sig.
Variable dependiente: Esperança de vida (1992)a.
Diagnósticos por casoa
3,040 71,9
-3,009 53,5
Número de caso37
99
Residuo tip.Esperança de
vida (1992)
Variable dependiente: Esperança de vida (1992)a.
Residus vs y ajustada Grαfico de dispersiσn
Variable dependiente: Esperanηa de vida (1992)
Regresiσn Valor pronosticado tipificado
210-1-2-3
Reg
resi
σn
Res
iduo
tipi
ficad
o4
3
2
1
0
-1
-2
-3
-4
Case statistics Case statistics
Case missfit
Potential for influence: Leverage
Influence
Residuals
Hat matrix
Influence Analysis
Diagnostic case statistics After fitting regression, use the instructionFpredict namevar predicted value of y
, cooksd Cook’s D influence measure, dfbeta(x1) DFBETAS for regression coefficient on var x1, dfits DFFITS influence measures, hat Diagonal elements of hat matrix (leverage)
, leverage (same as hat) , residuals
, rstandard standardized residuals , .rstudent Studentized residuals , stdf standard erros of predicte individual y, standard error of forecast, stdp standard errors of predicted mean y, stdr standard errors of residuals , welsch Welsch’s distance influence measure
.... display tprob(47, 2.337)Sort namevar list x1 x2 –5/1
Leverages
after fit …
. fpredict lev, leverage
. gsort -lev . list nombre lev in 1/5 nombre lev 1. RECAGUREN SOCIEDAD LIMITADA, .0803928 2. EL MARQUES DEL AMPURDAN S,L, .0767692 3. ADOBOS Y DERIVADOS S,L, .0572497 4. CONSERVAS GARAVILLA SA .0549707 5. PLANTA DE ELABORADOS ALIMENTICIOS MA .0531497 .
Box plot of leverage values . predict lev, leverage. graph lev, box s([_n])
Cases with extreme leverages
Leverage versus residual square plot. lvr2plot, s([_n])
Dfbeta’s: . fpredict beta, dfbeta(nt_paau). graph beta, box s([_n])
Regression: Outliers, basic ideaOutlier
Regression: Outliers, indicators
Indicator Description Rule of thumb (when “wrong”)
Resid Residual: actual – predicted -
ZResid Standardized residual: residual divided by standarddeviation residual
> 3 (in absolute value)
SResid Studentized Residu: residual divided by standarddeviation residual at that particular datapoint of X values
> 3 (in absolute value)
DResid Difference residual and residual when datapoint deleted
-
SDResid See DResid, standardized by standard deviation at that particular datapoint of X values
> 3 (in absolute value)
Regression: Outliers, in SPSS
Regression: Influential Points, Basic Idea
Influential Point (no outlier!)
Regression: Influential Points, Indicators
Indicator Description Rule of thumb
Lever Afstand tussen punt en overige puntenNB: potentieel invloedrijk
> 2 (p+1) / n
Cook Verandering in residuen van overige cases als een bepaalde case niet in de regressie meedoet
> 1
DfBeta Verschil tussen beta’s wanneer de case wel meedoet en wanneer die niet meedoet in het modelNB: voor elke beta krijgen we deze
-
SdfBeta DfBeta / Standaardfout DfBetaNB: voor elke beta
> 2/√n
DfFit Verschil tussen voorspelde waarde als case wel versus niet meedoet in model
-
SDfFit DfFit / standaarddeviatie SdFit > 2 /√(p/n)
CovRatio Verandering in Varianties/Covarianties als punt niet meedoet
Abs (CovRatio – 1)> 3 p / n
Regression: Influential points, in SPSS
Case 2 is an influential point
Regression: Influential Points, what to do?
• Nothing at all..
• Check data
• Delete a-typical datapoints, then repeat regression without these datapoints
“to delete a point or not is an issue statisticians disagree on”
MULTICOLLINEARITY
Diagnostic tools
Regression: Multicollinearity
• If predictors correlate “high”, then we speak of multicollinearity
• Is this a problem? If you want to asess the influence of each predictor, yes it is, because:– Standarderrors blow up, making coefficients not-
significant
Analyzing math data
. use "G:\Albert\COURSES\cursDAS\AS2003b\data\mat.dta", clear
. save "G:\Albert\COURSES\CursMetEstad\Curs2004\Metodes\mathdata.dta"file G:\Albert\COURSES\CursMetEstad\Curs2004\Metodes\mathdata.dta saved
. edit- preserve
. gen perform = (nt_m1+ nt_m2+ nt_m3)/3(110 missing values generated)
. corr perform nt_paau nt_acces nt_exp(obs=189)
| perform nt_paau nt_acces nt_exp---------+------------------------------------ perform | 1.0000 nt_paau | 0.3535 1.0000nt_acces | 0.5057 0.8637 1.0000 nt_exp | 0.5002 0.3533 0.7760 1.0000
. outfile nt_exp nt_paau nt_acces perform using "G:\Albert\COURSES\CursMetEsta> d\Curs2004\Metodes\mathdata.dat"
.
Multiple regression: perform vs nt_acces nt_paau
. regress perform nt_acces nt_paau Source | SS df MS Number of obs = 245 ---------+------------------------------ F( 2, 242) = 31.07 Model | 71.1787647 2 35.5893823 Prob > F = 0.0000 Residual | 277.237348 242 1.14560888 R-squared = 0.2043 ---------+------------------------------ Adj R-squared = 0.1977 Total | 348.416112 244 1.42793489 Root MSE = 1.0703 ------------------------------------------------------------------------------ perform | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- nt_acces | 1.272819 .2427707 5.243 0.000 .7946054 1.751032 nt_paau | -.2755092 .1835091 -1.501 0.135 -.6369882 .0859697 _cons | -1.513124 .9729676 -1.555 0.121 -3.42969 .4034425 ------------------------------------------------------------------------------
.Perform = rendiment a mates I a III
Collinearity
Diagnostics for multicollinearity . corre nt_paau nt_exp nt_acces (obs=276) | nt_paau nt_exp nt_acces --------+--------------------------- nt_paau| 1.0000 nt_exp| 0.3435 1.0000 nt_acces| 0.8473 0.7890 1.0000
. fit perform nt_paau nt_exp nt_access . vif Variable | VIF 1/VIF ---------+---------------------- nt_acces | 1201.85 0.000832 nt_paau | 514.27 0.001945 nt_exp | 384.26 0.002602 ---------+---------------------- Mean VIF | 700.13
.
Any explanatory variable with a VIF greater than 5 (or tolerance less than .2) show a degree of collinearity that may be Problematic
This ratio is called Tolerance
In the case of just nt_paau an nt_exp we Get
. vifVariable | VIF 1/VIF ---------+---------------------- nt_exp | 1.14 0.875191 nt_paau | 1.14 0.875191---------+----------------------Mean VIF | 1.14
.
VIF = 1/(1 – Rj2)
Multiple regression: perform vs nt_paau nt_exp
. regress perform nt_paau nt_exp Source | SS df MS Number of obs = 189 ---------+------------------------------ F( 2, 186) = 37.24 Model | 75.2441994 2 37.6220997 Prob > F = 0.0000 Residual | 187.897174 186 1.01019986 R-squared = 0.2859 ---------+------------------------------ Adj R-squared = 0.2783 Total | 263.141373 188 1.39968815 Root MSE = 1.0051 ------------------------------------------------------------------------------ perform | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- nt_paau | .3382551 .1109104 3.050 0.003 .119451 .5570593 nt_exp | .9040681 .1396126 6.476 0.000 .6286403 1.179496 _cons | -3.295308 1.104543 -2.983 0.003 -5.474351 -1.116266 ------------------------------------------------------------------------------
. predict yh(option xb assumed; fitted values)(82 missing values generated)
. predict e, resid(169 missing values generated)
.
. corr nt_exp nt_paau nt_acces(obs=276)
| nt_exp nt_paau nt_acces---------+--------------------------- nt_exp | 1.0000 nt_paau | 0.3435 1.0000nt_acces | 0.7890 0.8473 1.0000
Regression: Multicollinearity, Indicators
Indicator description Rule of thumb (when “wrong”)
Overall F_Test versus test coefficients
Overall F-Test is significant, but individual coefficients are not
-
Beta Standardized coefficient Outside [-1, +1]
Tolerance Tolerance = unique variance of a predictor (not shared/explained by other predictors) …NB: Tolerance per coefficient
< 0.01
Variantie Inflation Factor
√ VIF indicates how much the standard error of a particular coefficient is inflated due to correlatation between this particular predictor and the other predictors
NB: VIF per coefficient
>10
Eigenvalues …rather technical… +/- 0
Condition Index …rather technical… > 30
Variance Proportion …rather technical…look tot “loadings” on the dimensions
Loadings around 1
Regression: Multicollinearity, in SPSS
diagnostics
Regression: Multicollineariteit, in SPSS
Beta > 1 Tolerance, VIF in orde
Regressie: Multicollineariteit, in SPSS
2 eigenwaarden rond 0 Deze variabelen zorgen voor multicoll.CI in orde
Regression: Multicollinearity, what to do?
• Nothing… (if there is no interest in the individual coefficients, only in good prediction)
• Leave one (or more) predictor(s) out
• Use PCA to reduce high correlated variables to smaller number of uncorrelated variables
Variables Categòriques
Use: Survey_sample.sav, in i/.../data
Salari vs gènere | anys d’educacióstatus de treball
Creació de variables dicotòmiques
GET FILE='G:\Albert\Web\Metodes2005\Dades\survey_sample.sav'.
COMPUTE D1 = wrkstat=1.EXECUTE .COMPUTE D2 = wrkstat=2.EXECUTE .COMPUTE D3 = wrkstat=3.EXECUTE .COMPUTE D4 = wrkstat=4. EXECUTE .COMPUTE D5 = wrkstat=5.EXECUTE .COMPUTE D6 = wrkstat=6.EXECUTE .COMPUTE D7 = wrkstat=7.EXECUTE .COMPUTE D8 = wrkstat=8.EXECUTE .
Regressió en blocks
REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA CHANGE
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT rincome
/METHOD=ENTER sex
/METHOD=ENTER d2 d3 d4 d5 d6 d7 d8 .
Resumen del modelo
,192a ,037 ,036 2,806 ,037 70,384 1 1847 ,000
,496b ,246 ,243 2,487 ,210 73,059 7 1840 ,000
Modelo1
2
R R cuadradoR cuadradocorregida
Error típ. de laestimación
Cambio enR cuadrado Cambio en F gl1 gl2
Sig. delcambio en F
Estadísticos de cambio
Variables predictoras: (Constante), Gendera.
Variables predictoras: (Constante), Gender, D8, D3, D6, D5, D4, D7, D2b.
ANOVAc
554,345 1 554,345 70,384 ,000a
14547,005 1847 7,876
15101,350 1848
3718,200 8 464,775 75,127 ,000b
11383,150 1840 6,186
15101,350 1848
Regresión
Residual
Total
Regresión
Residual
Total
Modelo1
2
Suma decuadrados gl
Mediacuadrática F Sig.
Variables predictoras: (Constante), Gendera.
Variables predictoras: (Constante), Gender, D8, D3, D6, D5, D4, D7, D2b.
Variable dependiente: Respondent's incomec. Coeficientesa
11,708 ,210 55,882 ,000
-1,096 ,131 -,192 -8,390 ,000
11,733 ,188 62,433 ,000
-,629 ,119 -,110 -5,299 ,000
-3,016 ,171 -,366 -17,681 ,000
-,955 ,432 -,045 -2,211 ,027
-2,977 ,377 -,161 -7,898 ,000
-2,860 ,452 -,128 -6,330 ,000
-3,309 ,513 -,131 -6,456 ,000
-4,046 ,337 -,247 -11,994 ,000
-1,414 ,882 -,032 -1,603 ,109
(Constante)
Gender
(Constante)
Gender
D2
D3
D4
D5
D6
D7
D8
Modelo1
2
B Error típ.
Coeficientes noestandarizados
Beta
Coeficientesestandarizad
os
t Sig.
Variable dependiente: Respondent's incomea.
Regressió en blocks
REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA CHANGE
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT rincome
/METHOD=ENTER sex
/METHOD=ENTER educ
/METHOD=ENTER d2 d3 d4 d5 d6 d7 d8 . ANOVAd
547,844 1 547,844 69,553 ,000a
14516,575 1843 7,877
15064,420 1844
1590,788 2 795,394 108,739 ,000b
13473,631 1842 7,315
15064,420 1844
4431,483 9 492,387 84,975 ,000c
10632,937 1835 5,795
15064,420 1844
Regresión
Residual
Total
Regresión
Residual
Total
Regresión
Residual
Total
Modelo1
2
3
Suma decuadrados gl
Mediacuadrática F Sig.
Variables predictoras: (Constante), Gendera.
Variables predictoras: (Constante), Gender, Highest year of school completedb.
Variables predictoras: (Constante), Gender, Highest year of school completed, D3,D8, D5, D6, D4, D7, D2
c.
Variable dependiente: Respondent's incomed.
Coeficientesa
11,700 ,210 55,778 ,000
-1,091 ,131 -,191 -8,340 ,000
7,869 ,379 20,754 ,000
-1,135 ,126 -,198 -8,996 ,000
,283 ,024 ,263 11,941 ,000
8,455 ,342 24,715 ,000
-,679 ,115 -,119 -5,896 ,000
,241 ,021 ,224 11,302 ,000
-2,891 ,166 -,351 -17,451 ,000
-,801 ,418 -,038 -1,915 ,056
-2,625 ,366 -,142 -7,169 ,000
-2,647 ,438 -,119 -6,047 ,000
-3,554 ,497 -,141 -7,157 ,000
-3,802 ,327 -,232 -11,621 ,000
-1,044 ,854 -,024 -1,223 ,222
(Constante)
Gender
(Constante)
Gender
Highest year ofschool completed
(Constante)
Gender
Highest year ofschool completed
D2
D3
D4
D5
D6
D7
D8
Modelo1
2
3
B Error típ.
Coeficientes noestandarizados
Beta
Coeficientesestandarizad
os
t Sig.
Variable dependiente: Respondent's incomea.
Resumen del modelo
,191a ,036 ,036 2,807 ,036 69,553 1 1843 ,000
,325b ,106 ,105 2,705 ,069 142,582 1 1842 ,000
,542c ,294 ,291 2,407 ,189 70,034 7 1835 ,000
Modelo1
2
3
R R cuadradoR cuadradocorregida
Error típ. de laestimación
Cambio enR cuadrado Cambio en F gl1 gl2
Sig. delcambio en F
Estadísticos de cambio
Variables predictoras: (Constante), Gendera.
Variables predictoras: (Constante), Gender, Highest year of school completedb.
Variables predictoras: (Constante), Gender, Highest year of school completed, D3, D8, D5, D6, D4, D7, D2c.
Categorical Predictors
Is income dependent on years of age and religion ?
Categorical PredictorsCompute dummy variable for each category, except last
Categorical Predictors
And so on…
Categorical Predictors
Block 1
Categorical PredictorsBlock 2
Categorical Predictors
Ask for R2 change
Categorical Predictors
Model Summary
.101a .010 .010 5.424 .010 14.688 1 1421 .000
.172b .030 .026 5.379 .019 7.064 4 1417 .000
Model1
2
R
RSqua
re
Adjusted RSquar
e
Std. Errorof the
EstimateR SquareChange F Change df1 df2
Sig. FChange
Change Statistics
Predictors: (Constant), Age of Respondenta.
Predictors: (Constant), Age of Respondent, Jewish, Cath, None, Protb.
Look at R Square change for
importance of categorical
variable