multiple linear regression

Multiple Linear Regression Using SAS

Hello!I am Vamshi KrishnaI am here because I love to give presentations.

Assumptions of Linear Regression

Assumptions of Linear Regression

Linear relationship Multivariate normality No or little multicollinearity No auto-correlation Homoscedasticity

Multiple linear regression

What is a multiple linear regression?

Multiple linear regression attempts to model the relationship between two or more explanatory variables and a response variable by fitting a linear equation to observed data. Every value of the independent variable x is associated with a value of the dependent variable y.

When do you use regression analysis?

Regression analysis is also used to understand which among the independent variables are related to the dependent variable, and to explore the forms of these relationships. In restricted circumstances, regression analysis can be used to infer causal relationships between the independent and dependent variables.

Regression Analysis

What is regression analysis used for?

In statistics, regression analysis is a statistical process for estimating the relationships among variables. It includes many techniques for modeling and analysing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables.

Multiple Linear Regression: DIAGNOSTICS

RMSE

Lower values of RMSE indicate better fit. RMSE is a good measure of how accurately the model predicts the response, and is the most important criterion for fit if the main purpose of the model is prediction.

The meaning of R2

The value r2 is a fraction between 0.0 and 1.0, and has no units. An r2 value of 0.0 means that knowing X does not help you predict Y. There is no linear relationship between X and Y, and the best-fit line is a horizontal line going through the mean of all Y values. When r2 equals 1.0, all points lie exactly on a straight line with no scatter. Knowing X lets you predict Y perfectly.

Adjusted R2

The adjusted R-squared is a modified version of R-squared that has been adjusted for the number of predictors in the model. The adjusted R-squared increases only if the new term improves the model more than would be expected by chance. It decreases when a predictor improves the model by less than expected by chance.

Introduction:

Independent variables: Q1, Q2, Q3, Q4, Q5, Q6, Q7, Q8, Q9, Q10, Q11, Q12, Gender, Age_Group, Monthly_Spent,

Family_Size, Income_group, Profession, Education.

Dependent variable:🔸 Price

Multiple Linear Regression: SAS CODE:

Multiple Linear Regression:

PROCEDURE & DIMENTIONS:

To get best fit model i used significance level i.e 0.05 for removing non significance & Multicollinearity variables.


STEPWISE REGRESSION

Combines forward & backward. At each step, variables may be entered or removed if

they meet certain criteria. Useful for developing the best prediction equation from

the smallest number of variables. Redundant predictors removed. Computer-driven-- Controversial


MODEL SUMMARY

Model done in 9 steps using stepwise selection

Partial R2 of the relation is shown in the following picture 1.2


Rules of AIC & BIC & Mallows’s C(p) Index

The model with the smaller AIC is considered the better fitting model & AIC can be negative. To choose the model we use the criteria of lower AIC (-230.2E+4).

If BIC is positive, the saturated model (i.e. the model with one parameter for every case; the BIC for a saturated model will equal 0) is preferred (i.e. the more complex model is better). When BIC is negative, the current model is preferred. The more negative the BIC, the better the fit.

Condition Index – the condition index is calculated using a factor analysis on the independent variables. Values of 10-30 indicate a mediocre multicollinearity in the regression variables, values > 30 indicate strong multicollinearity.


RULE OF THUMB FOR INTERPRETATION OF R2This model R2 and Adj.R2 relation looks like below picture .10 = small (R ~ 3)

.25 = moderate (R ~ .5) .00 = no linearship .50 = strong (R~ .7) 1.00 = perfect linear

relationshipIn Our Model R2 is 0.9103 its near to 1.00 so there is a perfect relationship in our model

Picture 1.2


MODEL SUMMARY

Sign is there means diff is more so Accepting Alternate H0

In statistics, the Bayesian information criterion (BIC) or Schwarz criterion (also SBC, SBIC) is a criterion for model selection among a finite set of models


MODEL SELECTION


SELECTED MODEL ANOVA

Here F & P values are satisfied as per the RulesAnd if error(residual) is close to Zero model is good so here model is good

no auto-correlation


DURBIN-WATSON -- AUTOCORRELATION

linear regression analysis requires that there is little or no autocorrelation in the data. Autocorrelation occurs when the residuals are not independent from each other. In other words when the value of y(x+1) is not independent from the value of y(x).

We can test the linear regression model for autocorrelation with the Durbin-Watson test. Durbin-Watson's d tests the null hypothesis that the residuals are not linearly auto-correlated. While d can assume values between 0 and 4, values around 2 indicate no autocorrelation. As a rule of thumb values of 1.5 < d < 2.5 show that there is no auto-correlation in the data, however the Durbin-Watson test only analyses linear autocorrelation and only between direct neighbors, which are first order effects.


SELECTED MODEL SUMMARY

If the Chi-square value is greater than or equal to the critical valueThere is a significant difference between the groups we are studying. That is, the difference between actual data and the expected data (that assumes the groups aren’t different) is probably too great to be attributed to chance. So we conclude that our sample supports the hypothesis of a difference.

Critical value: 95th percentile of the Chi-Squared distribution with 400 DF 447.63246783

Here All diagnostics are satisfied as per rules so we are concluding that the model is best fit model

RMSE is an absolute measure of fit.

Use PRESS to assess your model's predictive ability. Usually, the smaller the PRESS value, the better the model's predictive ability. PRESS is used to calculate the predicted R2 which is usually more intuitive to interpret.


MULTIPLE LINEAR

REGRESSION

(PROCESS)

OUTCOME(D.V)

PREDICTOR

(I.V)

ONEDependent Variable

95%Confidence Interval ?

NINEIndependent Variables

Our process is easy

Interpretati

on

Data Mining

Techniques

Data Cleaning

Multiple Linear Regression: After completion of the model the predictor variables are as follows Widely Available Quantity purchase option Quality comparable to Branded items Good shelf life Quality Conscious Brand Loyality Age_group Monthly_Spent Education


PARAMETER ESTIMATES FOR ALL INDEPENDENT VARIABLES


◇ USING PARAMETER ESTIMATER INTERPRETING

THE RESULT


INTERPRETATION

Widely Available

As people response positive+storng or negitive+weak weight increases otherwise decreases.

Quantity purchase option

As people judgement varies towards Q2 Total_weight is also decreases(this is directly propotional)

Quality comparable to Branded items

The weight is depending upon the firmness of decision of the people.(if they agree weight decreases otherwise it increases) Good shelf life

🔸 As people response positive+storng or negitive+weak weight increases otherwise decreases.

Quality Conscious

🔸 As people response strong+positive or strong+negitive weight increases otherwise decreases

Brand Loyality🔸 As people response

positive+storng or negitive+weak weight increases otherwise decreases.

🔸


INTERPRETATION

Age_group

If the people are younger or older weight is high, otherwise weight is decreases

Monthly_Spent

As the monthly spent of the people increases weight also increases

Education

The person is graduated or P.G weight is high and if the person is inter/ssc or professional weight is low

Credits

Special thanks to MR K.Venkat Rao Director of Reachout Business Analytics Pvt Ltd for his Guidence and support all through my Career.

Reference

Sas.com Wikipedia.com Statisticssolutions.com Youtube.com

Thanks!Any questions?You can find me at [email protected]

multiple linear regression

Data & Analytics