econometrics project iorganda beatrice cristina 133 revised

14
- The Academy of Economic Studies - The Faculty of Economic Studies in Foreign Languages Influencial factor of the price of nail polish - Econometrics Project -

Upload: emi-baka

Post on 03-Dec-2015

216 views

Category:

Documents


1 download

DESCRIPTION

proiect econometrie ase

TRANSCRIPT

Page 1: Econometrics Project Iorganda Beatrice Cristina 133 Revised

- The Academy of Economic Studies -The Faculty of Economic Studies in Foreign Languages

Influencial factor of the price of nail polish

- Econometrics Project -

Student: Coordinator:

Iorganda Beatrice Cristina Prof. Dr. Daniela Serban

Group 133, Series A

Page 2: Econometrics Project Iorganda Beatrice Cristina 133 Revised

January, 2014

Introduction

The subject chosen for this project is from a personal curiosity to determine whether price influences the durability of a product. The product chosen is a product used on a frequent basis by women- the nail polish. The reason of choosing this type of product is because women are the respondents that usually take time to answer at questionnaires and deliver the truth behind the proper consumption.

Fig.1. Nail polish brands

The methodology used is represented by simple and multiple linear regressions, as well as hypothesis testing. Hypothesis testing is an integral part in producing quality research work and provides a reliable decision if an effect has occurred in response to the cause.

The potential outcomes of this work can be used in further studies for retailers because the questionnaire can provide information related to the age of women that use a specific brand, the average price expenditure index, how long do women hold their hands under water- indicators that will help the companies provide innovative products which will be in accordance with the modern woman.

Database description

Data selected from the questionnaire are the three most important factors with which I will develop this econometrics project, two of which influence the third one.

Durability- x1Time- x2 Price- y

2

Page 3: Econometrics Project Iorganda Beatrice Cristina 133 Revised

In this case study, there have been used 37 observations for the sample data.

Hypothesis testing

Based on the chosen model, we will conduct 2 hypothesis tests that reflect the importance of analyzing certain features and assumptions related to our data.

Ist category of hypothesis testingFirst of all, I will begin by making a test regarding the most used brand (from the questionnaire provided most of the women are using Flormar nail polish because of the good relationship quality-price) which seems to last more than the other brands mentioned. The average durability of Flormar users is of 4.428571 minutes. Sample results for 23 observations show that the average durability of other brands (except Flormar) was of 4.152173913 minutes, with a standard deviation of 1.76734767 minutes. We use hypothesis testing to see whether the result supports the results of women with an average age of 22 years old.

Survey data:

x̄1=4 . 42x̄2=4 . 15n1=23n2=14s1= 7 .76s2=3 .12The computations will be made in minutes

Step1Initial assumption: Women of an average age of 22 years old believe that Flormar nail polish lasts long than other brands.Alternative hypothesis: Those women are wrong and Flormar nail polish doesn’t last longer than other brands.

Step 2

H0 : μ=4 . 428H1 : μ≠4 .428

Step 3

3

Page 4: Econometrics Project Iorganda Beatrice Cristina 133 Revised

We are in the case of a both-sided test upon the mean, because of the alternative hypothesis.

Step 4

The significance level chosen is α=5% and therefore the rejection region is (-∞, -1.96)

¿

(1.96, ∞).

Step 5

Zcalc=x̄ 1− x̄2

√ s12

n1+s2

2

n2

=4 .42−4 .15

√7 . 76+3 .12=

0 .27

√10. 88=0 .08

Step 6

As Zcalc doesn’t fall into the rejection region, we decide that we cannot reject Ho . We do not

have enough sample evidence to infere that H1 is true, nor to reject H0. In 95% of cases we

cannot say for cartain that Flormar nail polish lasts longer than other brands, nor can we say that

it lasts less.

IInd category of hypothesis testingAfter the analysis made on the current questionnaire, 37% of the women that answered, prefer Flormar polish than other brands because of the good relationship quality (obtained by the

4

Page 5: Econometrics Project Iorganda Beatrice Cristina 133 Revised

durability of the product)-price. Sample test was conducted and for 31 observations, the results show that 14 women believe that Flormar exceeds the other brand’s durability. We need to verify whether this result supports the idea of the women that were present at this experiment.

Survey data:

p̂=1431

=0 .451=45 .15 %

n=31

The computations will be made in minutes

Step 1Initial assumption: The questionnaire claims that 37% of women prefer Flormar nail polish because of the price-quality good relationship (it has a high durability)Alternative hypothesis: The questionnaire is wrong and women don’t prefer Flormar nail polish and because of this, the sample evidence regarding personal preference is less than 37%.

Step 2

H0 : π=37 %H1 : π<37 %

Step 3

We are in the case of the test upon proportion to the left, because of the alternative hypothesis.

Step 4

The significance level chosen is α=5% and therefore the rejection region is (-∞, -1.645).

Step 5

5

Page 6: Econometrics Project Iorganda Beatrice Cristina 133 Revised

Zcalc=p̂−π 0

√ π0 (100−π 0 )

√n

=45 .15−37

√37(100−37 )31

=8. 15¿75. 19 ¿¿

=0. 108 ¿ ¿¿

Step 6

As Zcalc does not fall into the rejection region (-∞, -1.645), we decide that we cannot reject Ho .

We have enough sample evidence to reject H1 in 95% of the cases.

SIMPLE LINEAR REGRESSION MODEL We will firstly analyze the influence of durability upon price. This is a model with 1 regressor.

Consider the general form of the simple linear regression function:

Y i= β̂1+β2×X2+ε The variables of this model are Y i and X 2 .

Y i= Value of the dependent variable, priceX 2= Value of the independent variable, timeε = Residuals that do not have a significant influence upon price

The specific model for this sample is: Price= 6.89 + 1.511Durability +ε

SUMMARY OUTPUT

Regression Statistics

Multiple R0.258

7

R Square0.066

93

Adjusted R Square0.040

27

Standard Error12.44

94Observations 37

6

Page 7: Econometrics Project Iorganda Beatrice Cristina 133 Revised

ANOVA

df SS MS FSignificance

F

Regression 1 389.08 389.07982.5104

1 0.122091661Residual 35 5424.5 154.9865Total 36 5813.6

CoefficientsStandard

Error t Stat P-value Lower 95%Upper 95%

Lower 95.0% Upper 95.0%

Intercept 6.89038 4.5474 1.5152520.1386

9 -2.34122916 16.122-

2.3412292 16.12199564

Durability 1.51147 0.954 1.5844280.1220

9 -0.42515703 3.448088 -0.425157 3.448088081

The level of correlation between the variables is shown by multiple-R. In this case, it is 0.25 which doesn’t belong to the interval [0.75,1]. This shows a low level of correlation between the variables.In order to interpret the coefficients, we have to look first at the intercept. This represents the predicted value, the price would have if durability was 0. However, since the regressor cannot be 0, the interpretation of the intercept is meaningless. The slope is 1.511. This shows a positive correlation between Price & Durability. For any additional unit in Durability, it would result in 1.511 units increase in Price.In order to test the validity of the model, we shall hypothesis that all values of the Price are the same.

H0 :Pr ice1=Pr ice2=Pr ice3=. ..=Pr ice37H1 :Pr icei≠Pr ice j

In order to test this claim, we can compare F calculated with F critical for this model, but also

compare significance F with α=5%. Significance F (0.12) >0.05, therefore we can

not reject H0

and say with a confidence class of 95% that the model is NOT valid.To test the inference upon the slope, we have to test the validity of the confidence class.

7

Page 8: Econometrics Project Iorganda Beatrice Cristina 133 Revised

The confidence class is (-0.425, 3.44). This interval contains the value 0, therefore we must test the validity. We can do this by comparing the p-value (0.12) to α (0.05). P-value is higher than 0.05, therefore the inference on this slope is not valid.

Residual Analysis - Violation of assumptionsThe errors are distributed as follows, showing a skewness to the right :

From the residual plot above, we can see that the errors are randomly scattered, therefore there is no correlation between the errors. From the line fit plot, it can be noticed that the errors aren’t equally spread around the mean, therefore the model is heteroskedastic.Finally, I have conducted a Durbin-Watson test for this model. In the excel file, I have calculated the “d” which is 2.22 for the simple regression. dL and dU are 1.217 and 1.322 respectively, thus d being higher than dU means there is no statistical evidence to show that the errors are positively autocorrelated.

MULTIPLE LINEAR REGRESSION MODEL

We will add another independent variable to our model, time.

Y i= β̂1+β2×X2+ β3×X 3+ε The variables of this model are Y i ,X 2 andX3 .

8

0 20 40 60 80 100 1200

102030405060

Normal Probability Plot

Sample Percentile

Price

1 2 3 4 5 6 7 8 9 100

20

40

60

Durability Line Fit Plot

PricePredicted Price

Durability

Price

1 2 3 4 5 6 7 8 9 10-20-10

010203040

Durability Residual Plot

Durability

Resid

uals

Page 9: Econometrics Project Iorganda Beatrice Cristina 133 Revised

Y i= Value of the dependent variable, priceX 2= Value of the independent variable, durabilityX3 =Value of independent variable, time ε = Residuals that do not have a significant influence upon price

The specific model for our sample is: Price = 6.89 + 1.38 Durability - 0.17 Time+ ε

In order to analyze the correlation between the variables, we look at multiple-R. In this case, it is 0.26 which doesn’t belong to the interval [0.75, 1]. This shows a low level of correlation between the variables, but slightly improved by adding an extra regressor.To interpret the coefficients, we will first look at the intercept. This represents the predicted value, the price would have if the 2 regressors were 0. However, since the 2 regressors cannot be 0, the interpretation of the intercept is meaningless. The first slope is 1.38. This shows a positive correlation between Price & Durability. For any additional unit in Durability, it would result in 1.38 units increase in Price.The second slope is -0.17. This shows a negative correlation between Price & Time. For any additional unit in Time, it would result in -0.17 units decrease in Price.In order to test the validity of the model, we shall hypothesis that all values of the Price are the same.

H0 :Pr ice1=Pr ice2=Pr ice3=. ..=Pr ice37H1 :Pr icei≠Pr ice j

In order to test this claim, we can compare F calculated with F critical for this model, but also

compare significance F with α=5%. Significance F (0.28) >0.05, therefore we cannot reject H0

and say with a confidence class of 95% that the model is NOT valid.

To test the inference upon the slope, we have to test the validity of the confidence class.The first confidence class is (-0.684, 3.44). This interval contains the value 0, therefore we must test the validity. We can do this by comparing the p-value (0.18) to α (0.05). P-value is higher than 0.05, therefore the inference on this slope is not valid.The second confidence class is (-1.01, 0.67). This interval contains the value 0, therefore we must test the validity. We can do this by comparing the p-value (0.68) to α (0.05). P-value is higher than 0.05, therefore the inference on this slope is not valid.

9

Page 10: Econometrics Project Iorganda Beatrice Cristina 133 Revised

From the normal

probability, we can see that the distribution of errors is skewed to the right. Both residual plots present random scattering of the errors, meaning there is no correlation between the errors. Also, in both line fit plots, the errors are randomly dispersed around the mean, showing that both models are heteroskedastic.I have conducted a DW test for this model again and the results were the same as for the simple regression, with “d” being higher than “dU”, meaning there is no statistical evidence to show that the errors are positively autocorrelated.Finally, I have analyzed the two independent variables in order to see their coefficient of correlation, which in this case was -0.31. This shows that the two variables, durability and time, are negatively correlated in a small percentage. It also shows that the multicollinearity phenomenon does not occur.

10

1 2 3 4 5 6 7 8 9 10-20-10

010203040

Durability Residual Plot

Durability

Resid

uals 1 2 3 4 5 6 7 8 9 10

0

20

40

60

Durability Line Fit Plot

PricePredicted Price

Durability

Price

0 5 10 15 20 25 30 350

102030405060

Time Line Fit Plot

PricePredicted Price

Time

Price

0 5 10 15 20 25 30 35-20-10

010203040

Time Residual Plot

Time

Resid

uals

0 20 40 60 80 100 1200

20

40

60

Normal Probability Plot

Sample Percentile

Price

Page 11: Econometrics Project Iorganda Beatrice Cristina 133 Revised

ConclusionBased on the limited sample evidence and low correlation between the variables, the test must be repeated because we cannot be sure if time of drying and durability of the nail polish on the nail are the only factors that influence the price of such a product.

11