multiple linear regression
TRANSCRIPT
Multiple Linear Regression Using SAS
Hello!I am Vamshi KrishnaI am here because I love to give presentations.
Assumptions of Linear Regression
Assumptions of Linear Regression
Linear relationship Multivariate normality No or little multicollinearity No auto-correlation Homoscedasticity
Multiple linear regression
What is a multiple linear regression?
Multiple linear regression attempts to model the relationship between two or more explanatory variables and a response variable by fitting a linear equation to observed data. Every value of the independent variable x is associated with a value of the dependent variable y.
When do you use regression analysis?
Regression analysis is also used to understand which among the independent variables are related to the dependent variable, and to explore the forms of these relationships. In restricted circumstances, regression analysis can be used to infer causal relationships between the independent and dependent variables.
Regression Analysis
What is regression analysis used for?
In statistics, regression analysis is a statistical process for estimating the relationships among variables. It includes many techniques for modeling and analysing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables.
Multiple Linear Regression: DIAGNOSTICS
RMSE
Lower values of RMSE indicate better fit. RMSE is a good measure of how accurately the model predicts the response, and is the most important criterion for fit if the main purpose of the model is prediction.
The meaning of R2
The value r2 is a fraction between 0.0 and 1.0, and has no units. An r2 value of 0.0 means that knowing X does not help you predict Y. There is no linear relationship between X and Y, and the best-fit line is a horizontal line going through the mean of all Y values. When r2 equals 1.0, all points lie exactly on a straight line with no scatter. Knowing X lets you predict Y perfectly.
Adjusted R2
The adjusted R-squared is a modified version of R-squared that has been adjusted for the number of predictors in the model. The adjusted R-squared increases only if the new term improves the model more than would be expected by chance. It decreases when a predictor improves the model by less than expected by chance.
Introduction:
Independent variables: Q1, Q2, Q3, Q4, Q5, Q6, Q7, Q8, Q9, Q10, Q11, Q12, Gender, Age_Group, Monthly_Spent,
Family_Size, Income_group, Profession, Education.
Dependent variable:🔸 Price
Multiple Linear Regression: SAS CODE:
Multiple Linear Regression:
PROCEDURE & DIMENTIONS:
To get best fit model i used significance level i.e 0.05 for removing non significance & Multicollinearity variables.
Multiple Linear Regression:
STEPWISE REGRESSION
Combines forward & backward. At each step, variables may be entered or removed if
they meet certain criteria. Useful for developing the best prediction equation from
the smallest number of variables. Redundant predictors removed. Computer-driven-- Controversial
Multiple Linear Regression:
MODEL SUMMARY
Model done in 9 steps using stepwise selection
Partial R2 of the relation is shown in the following picture 1.2
Multiple Linear Regression:
Rules of AIC & BIC & Mallows’s C(p) Index
The model with the smaller AIC is considered the better fitting model & AIC can be negative. To choose the model we use the criteria of lower AIC (-230.2E+4).
If BIC is positive, the saturated model (i.e. the model with one parameter for every case; the BIC for a saturated model will equal 0) is preferred (i.e. the more complex model is better). When BIC is negative, the current model is preferred. The more negative the BIC, the better the fit.
Condition Index – the condition index is calculated using a factor analysis on the independent variables. Values of 10-30 indicate a mediocre multicollinearity in the regression variables, values > 30 indicate strong multicollinearity.
Multiple Linear Regression:
RULE OF THUMB FOR INTERPRETATION OF R2This model R2 and Adj.R2 relation looks like below picture .10 = small (R ~ 3)
.25 = moderate (R ~ .5) .00 = no linearship .50 = strong (R~ .7) 1.00 = perfect linear
relationshipIn Our Model R2 is 0.9103 its near to 1.00 so there is a perfect relationship in our model
Picture 1.2
Multiple Linear Regression:
MODEL SUMMARY
Sign is there means diff is more so Accepting Alternate H0
In statistics, the Bayesian information criterion (BIC) or Schwarz criterion (also SBC, SBIC) is a criterion for model selection among a finite set of models
Multiple Linear Regression:
MODEL SELECTION
Multiple Linear Regression:
SELECTED MODEL ANOVA
Here F & P values are satisfied as per the RulesAnd if error(residual) is close to Zero model is good so here model is good
no auto-correlation
Multiple Linear Regression:
DURBIN-WATSON -- AUTOCORRELATION
linear regression analysis requires that there is little or no autocorrelation in the data. Autocorrelation occurs when the residuals are not independent from each other. In other words when the value of y(x+1) is not independent from the value of y(x).
We can test the linear regression model for autocorrelation with the Durbin-Watson test. Durbin-Watson's d tests the null hypothesis that the residuals are not linearly auto-correlated. While d can assume values between 0 and 4, values around 2 indicate no autocorrelation. As a rule of thumb values of 1.5 < d < 2.5 show that there is no auto-correlation in the data, however the Durbin-Watson test only analyses linear autocorrelation and only between direct neighbors, which are first order effects.
Multiple Linear Regression:
SELECTED MODEL SUMMARY
If the Chi-square value is greater than or equal to the critical valueThere is a significant difference between the groups we are studying. That is, the difference between actual data and the expected data (that assumes the groups aren’t different) is probably too great to be attributed to chance. So we conclude that our sample supports the hypothesis of a difference.
Critical value: 95th percentile of the Chi-Squared distribution with 400 DF 447.63246783
Here All diagnostics are satisfied as per rules so we are concluding that the model is best fit model
RMSE is an absolute measure of fit.
Use PRESS to assess your model's predictive ability. Usually, the smaller the PRESS value, the better the model's predictive ability. PRESS is used to calculate the predicted R2 which is usually more intuitive to interpret.
Multiple Linear Regression:
MULTIPLE LINEAR
REGRESSION
(PROCESS)
OUTCOME(D.V)
PREDICTOR
(I.V)
ONEDependent Variable
95%Confidence Interval ?
NINEIndependent Variables
Multiple Linear Regression:
Our process is easy
Interpretati
on
Data Mining
Techniques
Data Cleaning
Multiple Linear Regression: After completion of the model the predictor variables are as follows Widely Available Quantity purchase option Quality comparable to Branded items Good shelf life Quality Conscious Brand Loyality Age_group Monthly_Spent Education
Multiple Linear Regression:
PARAMETER ESTIMATES FOR ALL INDEPENDENT VARIABLES
Multiple Linear Regression:
◇ USING PARAMETER ESTIMATER INTERPRETING
THE RESULT
Multiple Linear Regression:
INTERPRETATION
Widely Available
As people response positive+storng or negitive+weak weight increases otherwise decreases.
Quantity purchase option
As people judgement varies towards Q2 Total_weight is also decreases(this is directly propotional)
Quality comparable to Branded items
The weight is depending upon the firmness of decision of the people.(if they agree weight decreases otherwise it increases) Good shelf life
🔸 As people response positive+storng or negitive+weak weight increases otherwise decreases.
Quality Conscious
🔸 As people response strong+positive or strong+negitive weight increases otherwise decreases
Brand Loyality🔸 As people response
positive+storng or negitive+weak weight increases otherwise decreases.
🔸
Multiple Linear Regression:
INTERPRETATION
Age_group
If the people are younger or older weight is high, otherwise weight is decreases
Monthly_Spent
As the monthly spent of the people increases weight also increases
Education
The person is graduated or P.G weight is high and if the person is inter/ssc or professional weight is low
Credits
Special thanks to MR K.Venkat Rao Director of Reachout Business Analytics Pvt Ltd for his Guidence and support all through my Career.
Reference
Sas.com Wikipedia.com Statisticssolutions.com Youtube.com
Thanks!Any questions?You can find me at [email protected]