chapter 1 regression analysis[1]
DESCRIPTION
nnTRANSCRIPT
“Regression is a statistical technique which establish a functional relationship between two or more variables in the form of an equation to estimate the value of one variable based on the value of another variable”
Regression Analysis
• Simple Linear Regression Model
y = 0 + 1x +
• Simple Linear Regression Equation
y = 0 + 1x
• Estimated Simple Linear Regression Equation
xb b y 10
Principle of least squares technique
Case 1:
Observed points : (4,8); (8,1); (12,6)
Estimated points : (4,6); (8,5); (12,4)
Observed points : (4,8); (8,1); (12,6)
Estimated points : (4,2); (8,5); (12,8)
Error (graph 1) Error (graph 2)
8-6=2 8-2=6
1-5=-4 1-5=-4
6-4=2 6-8=-2
Total error=0 Total error=0
Absolute error Absolute error
I8-6I=2 I8-2I=6
I1-5I=4 I1-5I=4
I6-4I=2 I6-8I=2
Total Absolute error=8Total Abs error=12
Case 2:
Observed points: (2,4); (6,7); (10,2)
Estimated points: (2,4); (6,3); (10,2)
Observed points: (2,4); (6,7); (10,2)
Estimated points: (2,5); (6,4); (10,3)
Abs Error Abs Error
I4-4I=0 I4-5I=1
I7-3I=4 I7-4I=3
I2-2I=0 I2-3I=1
Total Abs error=4 Total Abs error=5
Error Square ErrorSquare
(4-4)2 =0 (4-5) 2=1
(7-3) 2=16 (7-4) 2=9
(2-2) 2=0 (2-3) 2=1
Sum of error square=16 (Graph 1)
Sum of error square=11 (Graph 2)
Least Squares Method
• Least Squares Criterion
where:
yi = observed value of the dependent variable for the i th observation
2)ˆ(min ii yy
nobservatioith for the
variabledependent theof valueestimated yi
• Slope for the Estimated Regression Equation
x = value of independent variable for ith observationy = value of dependent variable for ith observationn = total number of observations
• y-Intercept for the Estimated Regression Equation
221
xxn
yxxynb
xbyb 10
variabledependent for mean value y
t variableindependenfor mean value x
• Simple Linear Regression
Reed Auto periodically has a special week-long sale. As part of the advertising campaign Reed runs one or more television commercials during the weekend preceding the sale. Data from a sample of 5 previous sales are shown below.
Number of TV Ads Number of Cars
Sold1 143 242 181 173 27
• The HRD manager of a company wants to find a measure which he can use to fix the monthly income of persons applying for a job in the production department. As an experimental project, he collected data on 7 persons from that department referring to years of service and their monthly income (in 000’s).
Years of experience 11 7 9 5 8 6 10
Income 10 8 6 5 9 7 11
• Find the regression equation of income on years of service.
• What initial start would you recommend for a person applying for the job after having served in a similar capacity in another company for 13 years?
• Do you think other factors are to be considered (in addition to the years of service) in fixing the income? Explain.
Properties of regression lines and their coefficients:
1. Correlation coefficient is the geometric mean between the regression coefficient
2. The sign of correlation coefficient is the same as that of regression coefficient.
3. Regression coefficients are dependent of the change origin but not of scale.
In finance, it is of interest to look at the relationship between Y, a stock’s average return, and X, the overall market return. The slope coefficient computed by linear regression is called the stock’s beta by investment analysts. A beta greater than 1 indicates that the stock is relatively sensitive to changes in the market; a beta less than 1 indicates that the stock is relatively insensitive. For the following data, compute the beta and suggest market trend.
X (%)
10 12 8 15 9 11 8 10 13 11
Y (%)
11 15 3 18 10 12 6 7 18 13
Multiple regression Analysis
• A linear regression equation with more than one independent variable is called a multiple regression model.
chance. to dueerror random the is ε
variable.t independen x the of each with
associated tscoefficien regression the are ...βββ
constant a is β
estimated be to variabledependent of value the is y
where
εxβ........xβxβxββy
:form the takes variablest independen k
with equation regressionlinear The
k
k2,1,
0
kk3322110
technique. squaresleast of principle the by obtained
are and tscoefficien regression partial ....bb,b,b
y variabledependent of value estimated the is y
where
)y-(y (SSE) errors squares of sum the
minimizes which xb.......xbxbby
be equation regressionlinear fitted theLet
k321
2
kk22110
ˆ
ˆ
ˆ
• Let us consider the case where two independent variables and a dependent variable.
ts.coefficien regression the are β,β
intercept.-y the is β
chance. to dueerror random the is ε
variables.t independen are x and x
variabledependent the is y
where
εxβxββ y
:is variablest independen two involving
model regressionlinear multiple The
21
0
21
22110
2
21,0
21
2y2.11y1.20
22110
)y-(y(SSE) errors squres of sum the minimizes which
technique squaresleast of priniple the by determined are
and constants unknown the are bb,b
variables.t independen the are x,x
y. variabledependent of value estimated the is y
where
xbxbby
xbxbby
be equation regressionlinear multiple fitted theLet
ˆ
ˆ
ˆ
ˆ
or
22y2.121y1.2202
21y2.12
1y1.2101
2y2.11y1.20
210
xbx xbxbx y
xxbxbxbx y
xbxbnby
.determined be can b ,b,b
of values the equations following the solving By
2y2.11y1.2
22y2.111y1.2
2y2.11y1.20
2y2.11y1.20
22110
XbXbY
)x-(xb)x(xb)y-(y
(2)-(1)
xbxbby
xbxbbyor
xbxbby
be equation regressionlinear multiple fitted theLet
-(2)---
-(1)---
xxX
xxX
y-yY
where
XXXX
XXXYXXYb
XXXX
XXXYXXYb
222
111
2
2122
21
121212
y2.1
2
2122
21
122221
y1.2
• A marketing manager of a company wants to predict demand for the product. He is believing strongly demand (Y) is highly influenced by annual average price (X1) of the product (in units) & advertising expenditure (X2) (Rs in lakh).He has collected past data to know the effect of these factors on demand and given below:
Y 4 6 7 9 13 15X1 15 12 8 6 4 3X2 30 24 20 14 10 4
Ex: Christmas week is a critical period for most ski resorts. Because many students and adults are free from other obligations, they are able to spend several days indulging in their favorite pastime, skiing. A large proportion of gross revenue is earned during this period. A ski resort in Vermont wanted to determine the effect that weather had on its sales of lift tickets. The manager of the resort collected data on the number of lift tickets sold during Christmas week (y), the total snowfall in inches (x1), and the average temperature in degrees Fahrenheit (x2) for the past 10 years. Develop the multiple regression model.
Tickets Snowfall Temperature6835 19 117870 15 -196173 7 367979 11 227639 19 147167 2 -208094 21 399903 19 279788 18 269557 20 16
• The Federal Reserve is performing a preliminary study to determine the relationship between certain economic indicators and annual percentage change in the gross national product (GNP). Two such indicators being examined are the amount of the federal government’s deficit (in billions of dollars) and the Dow Jones Industrial Average (the mean value over the year). Data for 6 years follow:
Change in GNP 2.5 -1.0 4.0 1.0 1.5 3.0Federal Deficit 100.0 400.0 120.0 200.0 180.0 80.0Dow Jones 2850 2100 3300 2400 2550 2700
i) Calculate the least squares equation that best describes the data.
ii) What % change in GNP would be expected in a year in which the federal deficit was $240 billion and the mean Dow Jones value was 3000?
• Multiple correlation analysis:
It is a measure of association between a dependent variable and several independent variables taken together.
The coefficient of multiple correlation is given by,
1. and 0 between in lie always value Its
r1
rr2rrrR
212
12y2y12y2
2y1
y.12
• Coefficient of multiple determination:
It is the proportion of the total variation in the multiple values of dependent variable y, accounted for or explained by the independent variables in the multiple regression model.
• The square of coefficient of multiple correlation is called Coefficient of multiple determination.