multiple linear regression

32
Multiple Linear Regression Using SAS

Upload: vamshi-krishna-guptha

Post on 12-Apr-2017

536 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Multiple Linear Regression

Multiple Linear Regression Using SAS

Page 2: Multiple Linear Regression

Hello!I am Vamshi KrishnaI am here because I love to give presentations.

Page 3: Multiple Linear Regression

Assumptions of Linear Regression

Page 4: Multiple Linear Regression

Assumptions of Linear Regression

Linear relationship Multivariate normality No or little multicollinearity No auto-correlation Homoscedasticity

Page 5: Multiple Linear Regression

Multiple linear regression

Page 6: Multiple Linear Regression

What is a multiple linear regression?

Multiple linear regression attempts to model the relationship between two or more explanatory variables and a response variable by fitting a linear equation to observed data. Every value of the independent variable x is associated with a value of the dependent variable y.

Page 7: Multiple Linear Regression

When do you use regression analysis?

Regression analysis is also used to understand which among the independent variables are related to the dependent variable, and to explore the forms of these relationships. In restricted circumstances, regression analysis can be used to infer causal relationships between the independent and dependent variables.

Regression Analysis

What is regression analysis used for?

In statistics, regression analysis is a statistical process for estimating the relationships among variables. It includes many techniques for modeling and analysing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables.

Page 8: Multiple Linear Regression

Multiple Linear Regression: DIAGNOSTICS

RMSE

Lower values of RMSE indicate better fit. RMSE is a good measure of how accurately the model predicts the response, and is the most important criterion for fit if the main purpose of the model is prediction.

The meaning of R2

The value r2 is a fraction between 0.0 and 1.0, and has no units. An r2 value of 0.0 means that knowing X does not help you predict Y. There is no linear relationship between X and Y, and the best-fit line is a horizontal line going through the mean of all Y values. When r2 equals 1.0, all points lie exactly on a straight line with no scatter. Knowing X lets you predict Y perfectly.

Adjusted R2

The adjusted R-squared is a modified version of R-squared that has been adjusted for the number of predictors in the model. The adjusted R-squared increases only if the new term improves the model more than would be expected by chance. It decreases when a predictor improves the model by less than expected by chance.

Page 9: Multiple Linear Regression

Introduction:

Independent variables: Q1, Q2, Q3, Q4, Q5, Q6, Q7, Q8, Q9, Q10, Q11, Q12, Gender, Age_Group, Monthly_Spent,

Family_Size, Income_group, Profession, Education.

Dependent variable:🔸 Price

Page 10: Multiple Linear Regression

Multiple Linear Regression: SAS CODE:

Page 11: Multiple Linear Regression

Multiple Linear Regression:

PROCEDURE & DIMENTIONS:

To get best fit model i used significance level i.e 0.05 for removing non significance & Multicollinearity variables.

Page 12: Multiple Linear Regression

Multiple Linear Regression:

STEPWISE REGRESSION

Combines forward & backward. At each step, variables may be entered or removed if

they meet certain criteria. Useful for developing the best prediction equation from

the smallest number of variables. Redundant predictors removed. Computer-driven-- Controversial

Page 13: Multiple Linear Regression

Multiple Linear Regression:

MODEL SUMMARY

Model done in 9 steps using stepwise selection

Partial R2 of the relation is shown in the following picture 1.2

Page 14: Multiple Linear Regression

Multiple Linear Regression:

Rules of AIC & BIC & Mallows’s C(p) Index

The model with the smaller AIC is considered the better fitting model & AIC can be negative. To choose the model we use the criteria of lower AIC (-230.2E+4).

If BIC is positive, the saturated model (i.e. the model with one parameter for every case; the BIC for a saturated model will equal 0) is preferred (i.e. the more complex model is better). When BIC is negative, the current model is preferred. The more negative the BIC, the better the fit.

Condition Index – the condition index is calculated using a factor analysis on the independent variables. Values of 10-30 indicate a mediocre multicollinearity in the regression variables, values > 30 indicate strong multicollinearity.

Page 15: Multiple Linear Regression

Multiple Linear Regression:

RULE OF THUMB FOR INTERPRETATION OF R2This model R2 and Adj.R2 relation looks like below picture .10 = small (R ~ 3)

.25 = moderate (R ~ .5) .00 = no linearship .50 = strong (R~ .7) 1.00 = perfect linear

relationshipIn Our Model R2 is 0.9103 its near to 1.00 so there is a perfect relationship in our model

Picture 1.2

Page 16: Multiple Linear Regression

Multiple Linear Regression:

MODEL SUMMARY

Sign is there means diff is more so Accepting Alternate H0

In statistics, the Bayesian information criterion (BIC) or Schwarz criterion (also SBC, SBIC) is a criterion for model selection among a finite set of models

Page 17: Multiple Linear Regression

Multiple Linear Regression:

MODEL SELECTION

Page 18: Multiple Linear Regression

Multiple Linear Regression:

SELECTED MODEL ANOVA

Here F & P values are satisfied as per the RulesAnd if error(residual) is close to Zero model is good so here model is good

no auto-correlation

Page 19: Multiple Linear Regression

Multiple Linear Regression:

DURBIN-WATSON -- AUTOCORRELATION

linear regression analysis requires that there is little or no autocorrelation in the data.  Autocorrelation occurs when the residuals are not independent from each other.  In other words when the value of y(x+1) is not independent from the value of y(x). 

We can test the linear regression model for autocorrelation with the Durbin-Watson test.  Durbin-Watson's d tests the null hypothesis that the residuals are not linearly auto-correlated.  While d can assume values between 0 and 4, values around 2 indicate no autocorrelation.  As a rule of thumb values of 1.5 < d < 2.5 show that there is no auto-correlation in the data, however the Durbin-Watson test only analyses linear autocorrelation and only between direct neighbors, which are first order effects.

Page 20: Multiple Linear Regression

Multiple Linear Regression:

SELECTED MODEL SUMMARY

If the Chi-square value is greater than or equal to the critical valueThere is a significant difference between the groups we are studying. That is, the difference between actual data and the expected data (that assumes the groups aren’t different) is probably too great to be attributed to chance. So we conclude that our sample supports the hypothesis of a difference.

Critical value: 95th percentile of the Chi-Squared distribution with  400 DF 447.63246783

Here All diagnostics are satisfied as per rules so we are concluding that the model is best fit model

 RMSE is an absolute measure of fit.

Use PRESS to assess your model's predictive ability. Usually, the smaller the PRESS value, the better the model's predictive ability. PRESS is used to calculate the predicted R2 which is usually more intuitive to interpret.

Page 21: Multiple Linear Regression

Multiple Linear Regression:

MULTIPLE LINEAR

REGRESSION

(PROCESS)

OUTCOME(D.V)

PREDICTOR

(I.V)

Page 22: Multiple Linear Regression

ONEDependent Variable

95%Confidence Interval ?

NINEIndependent Variables

Page 23: Multiple Linear Regression

Multiple Linear Regression:

Page 24: Multiple Linear Regression

Our process is easy

Interpretati

on

Data Mining

Techniques

Data Cleaning

Page 25: Multiple Linear Regression

Multiple Linear Regression: After completion of the model the predictor variables are as follows Widely Available Quantity purchase option Quality comparable to Branded items Good shelf life Quality Conscious Brand Loyality Age_group Monthly_Spent Education

Page 26: Multiple Linear Regression

Multiple Linear Regression:

PARAMETER ESTIMATES FOR ALL INDEPENDENT VARIABLES

Page 27: Multiple Linear Regression

Multiple Linear Regression:

◇ USING PARAMETER ESTIMATER INTERPRETING

THE RESULT

Page 28: Multiple Linear Regression

Multiple Linear Regression:

INTERPRETATION

Widely Available

As people response positive+storng or negitive+weak weight increases otherwise decreases.

Quantity purchase option

As people judgement varies towards Q2 Total_weight is also decreases(this is directly propotional)

Quality comparable to Branded items

The weight is depending upon the firmness of decision of the people.(if they agree weight decreases otherwise it increases) Good shelf life

🔸 As people response positive+storng or negitive+weak weight increases otherwise decreases.

Quality Conscious

🔸 As people response strong+positive or strong+negitive weight increases otherwise decreases

Brand Loyality🔸 As people response

positive+storng or negitive+weak weight increases otherwise decreases.

🔸

Page 29: Multiple Linear Regression

Multiple Linear Regression:

INTERPRETATION

Age_group

If the people are younger or older weight is high, otherwise weight is decreases

Monthly_Spent

As the monthly spent of the people increases weight also increases

Education

The person is graduated or P.G weight is high and if the person is inter/ssc or professional weight is low

Page 30: Multiple Linear Regression

Credits

Special thanks to MR K.Venkat Rao Director of Reachout Business Analytics Pvt Ltd for his Guidence and support all through my Career.

Page 31: Multiple Linear Regression

Reference

Sas.com Wikipedia.com Statisticssolutions.com Youtube.com

Page 32: Multiple Linear Regression

Thanks!Any questions?You can find me at [email protected]