linear regression model-the national veterans' organization
TRANSCRIPT
Kirandeep Kaur
BSIS 610
Analysis Collaborators: Yesenia Ortiz and Divya Tanna
December 12, 2014
This research report will be divided into the following sections.
1. How the National Veterans’ Organization can use a model for predicting donors’
gift amounts.
2. An explanation of using descriptive statistics on the output variable, average gift
amount.
3. Graphs that portray the strength of relationships between each of the 11 variables
and the output variable, average gift amount.
4. A multiple linear regression model that includes only selected input variables and
their relationships with average gift amount.
5. Recommendations for the multiple regression model.
1. Introduction:
A model can be used in several ways to predict donors’ average gift amounts to the
National Veterans’ Organization. Average gift amount refers to how much individuals donate on
average to the organization. Lawrence Henze stated four advantages for an organization to
conduct a model, which are outlined next. First, a model enables the organization to analyze
everyone in its database and select individuals who are more likely to donate based on their
profiles. Second, a model helps the organization better understand which potential donors are
likely to donate the most. Therefore, a model allows the organization to select a marketing
technique that would effectively target prospective donors. In addition, the model allows the
organization to save time, energy and target individuals who are most likely to donate to the
organization.
2. Average Gift Descriptive Statistics:
Descriptive Statistics
Mean 10.9828652
Median 9.4
Mode 15
Standard Deviation 7.50158549
On average, individuals will donate $10.98 to the National Veterans’ Organization.
Furthermore, when analyzing the data, $15 shows up the most, which implies that most
individuals donate $15 to the organization.
Since the mean is higher than the median, the higher mean implies that there are outliers,
such as $30 and $50 when looking at the raw data.
Because the mean is distorted due to outliers, standard deviation is also distorted since
values are subtracted from the mean to get the standard deviation.
Therefore, analysts could not predict precisely how much each person will donate
because people could donate ±$7.50.
BIN Frequency Cumulative %
10 1184 56.41%
20 761 92.66%
30 120 98.38%
40 18 99.24%
50 6 99.52%
60 2 99.62%
70 3 99.76%
80 2 99.86%
90 1 99.90%
100 2 100.00%
110 0 100.00%
120 0 100.00%
More 0 100.00%
The following points pertain to the histogram and table above.
There is a 56.41 percent chance that individuals will donate $10 or less on average
There is a 98.38 percent chance that individuals will donate $30 or less on average.
There is a 100 percent chance that individuals will donate $100 or less on average.
0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
120.00%
0
500
1000
15001
0
20
30
40
50
60
70
80
90
10
0
11
0
12
0
13
0
14
0
15
0
Mo
re
Fre
qu
en
cy
AVGGIFT
Histogram
Frequency
Cumulative %
3. Potential Input Variables
Regardless of being a homeowner or not, individuals donate approximately the same
amount or less on average.
There is no correlation between homeowner and average gift amount (AVGGIFT).
The linear regression model will exclude the input variable, homeowner, because the
trendline is quite straight and there is no relationship between homeowner and
AVGGIFT.
y = -0.0298x + 11.005R² = 3E-06
0
20
40
60
80
100
120
0 0.2 0.4 0.6 0.8 1 1.2
AV
GG
IFT
($)
Homeowner
Relationship Between Homeowner and AVGGIFT ($)
The R2 states that the number of children (NUMCHILD) explains only 0.0011 percent of
the variation in average gift amount (AVGGIFT). Therefore, the R2 shows a weak
correlation between the two variables.
As parents have more children, they tend to donate less on average to the organization.
The linear regression model will exclude this input variable, because a weak correlation
between the NUMCHILD and AVGGIFT exists.
y = -0.6572x + 11.693R² = 0.0011
0
20
40
60
80
100
120
0 1 2 3 4 5 6
AV
GG
IFT
($)
NUMCHILD
Relationship Between NUMCHILD and AVGGIFT ($)
Individuals do not donate more or less as their incomes decrease or increase.
Based on the finding listed above, a weak correlation between income and average gift
amount (AVGGIFT) exists.
The linear regression model will exclude the input variable, income, because a weak
correlation between income and AVGGIFT exists
.
y = 0.5505x + 8.8438R² = 0.0145
0
20
40
60
80
100
120
0 1 2 3 4 5 6 7 8
AV
GG
IFT
($)
Income
Relationship Between Income and AVGGIFT ($)
AVGGIFT
Linear (AVGGIFT)
Regardless of gender, both genders appear to donate approximately the same amount.
Based on the finding listed above, no relationship between gender and average gift
amount (AVGGIFT) exists.
The linear regression model will exclude the input variable, gender, because gender has
no correlation with AVGGIFT and the trendline is flat.
The R2 states that wealth only explains 0.0082 percent of variation in average gift amount
(AVGGIFT) and shows no correlation between the two variables.
The linear regression model will exclude wealth as an input variable because there is no
relationship between wealth and AVGGIFT and the trendline is flat.
y = -1.1609x + 11.686R² = 0.0057
0
20
40
60
80
100
120
0 0.2 0.4 0.6 0.8 1 1.2
AV
GG
IFT
($)
Gender
Relationship Between Gender and AVGGIFT($)
AVGGIFT Linear (AVGGIFT)
y = 0.2721x + 9.2329R² = 0.0082
0
20
40
60
80
100
120
0 2 4 6 8 10
AV
GG
IFT
($)
Wealth
Relationship Between Wealth and AVGGIFT ($)
AVGGIFT
Linear (AVGGIFT)
The graph shows that the lower a potential donor’s average home value (HV), the
potential donor is more likely to donate.
Although the R2 is higher than most other graphs, there is still a weak relationship
between HV and average gift amount (AVGGIFT).
The linear regression model will include HV as an input because the R2 is higher for this
relationship than most other relationships (from other graphs).
Also, the linear regression model will include this input variable because the trendline’s
slope is going upward.
y = 0.0013x + 9.53R² = 0.0257
0
20
40
60
80
100
120
0 1000 2000 3000 4000 5000 6000 7000
AV
GG
IFT
($)
HV ($ in hundreds)
Relationship Between HV and AVGGIFT ($)
AVGGIFT
Linear (AVGGIFT)
As donors’ median family income increases, fewer individuals donate to the organization,
on average.
Although the R2 is higher for median family income (ICMED) and average gift amount
(AVGGIFT) compared to other graphs, there is still a weak correlation between the two
variables.
The linear regression model will include the input variable, ICMED, because the R2
(0.0187 percent) is higher for ICMED and AVGGIFT than several other input variables
with AVGGIFT.
In addition, the linear regression model will include ICMED because the trendline’s
slope is going upward.
y = 0.006x + 8.6391R² = 0.0187
0
20
40
60
80
100
120
0 500 1000 1500 2000
AV
GG
IFT
($)
ICMED ($ in hudreds)
Relationship Between ICMED and AVGGIFT ($)
AVGGIFT
Linear (AVGGIFT)
In general, as average family income (ICAVG) increases, families tend to donate less to
the organization.
Although the R2 of 0.0155 percent is high for ICAVG and average gift amount
(AVGGIFT) compared to several other graphs, there is still a weak correlation between
the two variables.
The linear regression model includes ICAVG because the R2 is high compared to other
graphs and the slope is going upward.
y = 0.0056x + 8.561R² = 0.0155
0
20
40
60
80
100
120
0 500 1000 1500
AV
GG
IFT
($)
Average Family Income ($ in hundreds)
Relationship Between ICAVG and AVGGIFT ($)
AVGGIFT
Linear (AVGGIFT)
As the percent earnings less than 15K in potential donors’ neighborhood (IC15) increase,
the less likely individuals will donate. In other words, as individuals earn less and less
than $15K, they are less likely to donate.
Since the R2 is low, there is a weak correlation between IC15 and average gift amount
(AVGGIFT).
Since the R2 of 0.0047 percent is approximately in the same range as most R2s from other
graphs, the linear regression model will exclude the input variable IC15.
y = -0.0427x + 11.611R² = 0.0047
0
20
40
60
80
100
120
0 20 40 60 80 100
AV
GG
IFT
($)
IC15 (%)
Relationship Between IC15 and AVGGIFT
AVGGIFT
Linear (AVGGIFT)
In general, as individuals receive more promotions, they are less likely to donate to the
organization.
Although the R2 of 0.0277 percent is higher than most of the other R2s, there is still a
weak correlation between number of promotions (NUMPROM) and average gift amount
(AVGGIFT).
NUMPROM refers to the amount of promotions the organization gives to donors.
The linear regression model will include NUMPROM because the trendline’s slope is
going downward and the R2 is high compared to most of the R2s in other graphs.
y = -0.0556x + 13.654R² = 0.0277
0
20
40
60
80
100
120
0 50 100 150 200
AV
GG
IFT
($)
NUMPROM
Relationship Between NUMPROM and AVGGIFT
AVGGIFT
Linear (AVGGIFT)
There is no correlation between number of months since last donation
(TOTALMONTHS) and average gift amount (AVGGIFT).
The linear regression model will exclude TOTALMONTHS as an input variable because
there is no correlation between TOTALMONTHS and AVGGIFT. In other words, the
linear regression model will exclude TOTALMONTHS because the trendline is straight.
4. Multiple Linear Regression Model
Regression Model
Imput Variable Coefficient
Intercept 12.6402
HV 0.0001
ICMED 0.0162
ICAVG -0.0158
NUMPROM -0.0475
(a) Multiple Linear Regression Model
Average Gift Amount= 12.6402 *NOTE: The values for HV, ICMED,
+ (0.0001*HV) ICAVG AND NUMPROM are the
+ (0.0162*ICMED) medians of these coefficients.
+ (-0.0158*ICAVG)
+(-0.0475*NUMPROM)
Average Gift Amount=12.6402 +(0.0001*822) +(0.0162*357) +(-0.0158*398) +(-0.0475*46)
y = 0.0897x + 8.1683R² = 0.0023
0
20
40
60
80
100
120
0 5 10 15 20 25 30 35 40
AV
GG
IFT
($)
TOTALMONTHS
Relationship Between TOTALMONTHS and AVGGIFT
AVGGIFT
Linear (AVGGIFT)
Average Gift Amount=12.6402 +0.0822 +5.7834 -6.2884 -2.185 Average Gift Amount= $10.03
(b) Summary of Training Sample Statistics
R2 0.0483
The R2 of 0.0483 percent implies a weak correlation between the selected input
variables and output variable, average gift amount (AVGGIFT).
RMS Error 7.325
The training RMS Error is .01 higher than the standard deviation from the multiple regression model (7.3399).
Standard Deviation
Descriptive Statistics (Without Model) 7.501585
With Multiple Regression Model 7.3399 Since the standard deviation from the multiple linear regression model is lower than the
standard deviation from the descriptive statistics, the linear regression model would make
a more precise prediction on how much individuals would donate on average.
Basically, the lower the standard deviation, the more precise a prediction is made on the
output variable.
(c) Description of How Well the Model Performs on New Data.
Column1 RMS Error
Training Data 7.325
Validation Data 7.257
Validation Data-Training Data
7.257-7.325= -0.068
Regardless of the difference being a negative or positive, the difference between the
training RMS error and validation RMS error is quite close.
Since the difference between the training RMS error and validation RMS error is quite
close, the multiple linear regression model appears to perform well on new data.
5. Recommendations:
In order to improve the linear regression model, analysts should consider interest group
(association with veterans or individuals currently serving the army, air force, etc.) and
church attendance as additional input variables.
According to Burks (2014), the analysts should add interest group as an additional input
variable. Individuals who know a veteran or individual currently serving the army, air
force, etc. may empathize with the organization’s cause and may donate more to the
organization than individuals who do not know a veteran or person currently serving in
the army, air force, etc. In addition, the organization can show “empathy in their
marketing messages, these charitable organizations can reach more people who want to
offer monetary aid to those causes” (Burks 2014).
Analysts could gain insight and confirm on whether individuals who know a veteran or
an individual currently serving for the army, air force, etc., are more likely to donate to
the National Veterans’ Organization or not.
Furthermore, analysts should consider adding church attendance as an additional input
variable because “giving in a religious setting occurs on a regular basis, the habit of
giving can develop quite readily” (DiDonato 2012). Basically, individuals who attend
church regularly tend to donate more on average.
Therefore, analysts could gain insight on whether regular churchgoers tend to donate
more on average or not
References
Burks, Robin (2014, July 29). A new study reveals why some people donate to charity more than
others. Retrieved from http://www.techtimes.com/articles/11536/20140729/a-new-study-
reveals-why-some-people-donate-to-charity-more-than-others.htm#ixzz3LhgDU8zx
DiDonato, Nicholas C. (2012, September 30). Not conservatives, but religious people, more
charitable. Retrieved from
http://www.patheos.com/blogs/scienceonreligion/2012/09/not-conservatives-but-
religious-people-more-charitable/
Henze, Lawrence. Using Statistical Modeling to Increase Donations. Retrieved from
https://www.blackbaud.com/files/resources/downloads/WhitePaper_TargetAnalytics_Stat
isticalModeling.pdf