group9 section d
TRANSCRIPT
-
8/11/2019 Group9 Section D
1/16
Multiple Regress
Group 9Akshey Bhogra (P
Rahul Kaman (P
Shrishti Khushiram (PG
Indu (
Vikram Singh (P
-
8/11/2019 Group9 Section D
2/16
Objective
To develop a multivariate regression model which can determine the car pricevalue based on a variety of characteristics such as mileage, make, model,engine size, interior style, and cruise control.
Sample Size : 800 2005 GM cars
-
8/11/2019 Group9 Section D
3/16
Simple Linear Regression
Create a Simple Linear Regression to find the relationshipbetween the price of a car and mileage
Price = 24723-0.17 Mileage T statistics for the slope coefficient ( b1) t =-4.09 ( p value 2 % ) is better but still the value of R square isquite low
-
8/11/2019 Group9 Section D
7/16
Equation 2 ( contd..)
The normal probability plot shows that at bothend there is deviation from the normaldistribution, more variability when prices arehigher
The histogram shows that it is posThe long upper tail is due to the high rmodels such as Cadillac XLR V 8 , a
The assumptions that the errors are distributed normally is violated
-
8/11/2019 Group9 Section D
8/16
Equation 2 ( contd..) Hetroskedasticity
1. Variance is not constant2. Model is inaccurate3. Higher price shows more variability
1. Residuals are obsealphabetical order of obser
2. Cars with similar make asimilar retail price
3. Hence make should be athe model
-
8/11/2019 Group9 Section D
9/16
Specially constructed variables
Certain factors such as make and model also impact the retail price ofvariable, hence creation and inclusion of dummy variable in the modelis important
Dummy variable can be created forMake
ModelTrimType
-
8/11/2019 Group9 Section D
10/16
Multicollinearity
The equation missed liter and considered only cylinder Cylinder and Liter are highly correlated variables, so both cant be
used in modelling of price Regression model constructed to determine which one is more
precise ( Cylinder or Liter ) Liter was more precise and easy to measure and hence cylinder
can be removed Data is transformed to log to remove the effects of the outliers
-
8/11/2019 Group9 Section D
11/16
Equation 3
TPrice = 3.98-0.000003 Mileage + 0.0997 Liter+ 0.0400 Buick+0.249 Cadillac +0.00937 Chev +0.345 SAAB
S=0.0515753 R-square = 91.7 % R-square (adjusted) =91.6 %
R-square value has improved considerably as compared to previousmodel
-
8/11/2019 Group9 Section D
12/16
Interpretations
1. The error terms are not distributednormally
2. Residual versus fitted shows clustering isstill visible
3. The residual vs observation order showssystematic pattern but are muchpronounced than earlier
4. More variables need to be included
-
8/11/2019 Group9 Section D
13/16
Equation 4
Tprice =3.92 -0.000004 Mileage +0.0958 Liter +0.0335 Doors+0.00752 Cruise +0.00522 Sound + 0.00626 Leather +0.0417buick +0.233 Cadillac - .0133 Chev - 0.00042 Pontiac +0.281SAAB+ 0.138 Conv -0.0890 Hatchback -0.0711 Sedan
S= 0.0393 R square = 95.2 % R-square ( adjusted ) = 95.1 %
-
8/11/2019 Group9 Section D
14/16
Interpretation
1. The residuals appear to behomoscedastic and more closely tofollow a normal distribution
2. May consider to include model, butthen dummy variable will be large
-
8/11/2019 Group9 Section D
15/16
Recommendations
1. Try more statistical models to come up with the multivariate regression model
2. Check Model assumptions
3. If two or more variables are related to each other, take one which is moresignificant and easy to measure
4. Include qualitative variables in the model if they impact the dependent variable( through dummy variable)
-
8/11/2019 Group9 Section D
16/16
Thank You