business statistics l1
TRANSCRIPT
-
7/27/2019 Business Statistics L1
1/34
Business Statistics
Fall, 2013Introduction
-
7/27/2019 Business Statistics L1
2/34
Lecture outline
What is statistics?
Summarizing the distribution of data
- capturing the central tendency- spread
-
7/27/2019 Business Statistics L1
3/34
The word statistics originally meant the
collection of information about and for the
state.
It is now a scientific method of collecting and
analyzing data (making sense of
numerical/quantitative information) to assist
in making more effective decisions.
-
7/27/2019 Business Statistics L1
4/34
In statistics, we deal with uncertainty. We dont dealwith What is but of What probably is.
But what do we mean by it is probably that?
Language alone is inadequate to illustrate the degreeof uncertainty, we need more formal structure for thispurpose.
The language of probability will be the focus of the firstpart of this course.
-
7/27/2019 Business Statistics L1
5/34
Also, in statistics, we deal with samples. We
make statements about a population based on
the results of a sample. This is the focus of the
second part of this course.
Beware, some uncertainty will always remain.
-
7/27/2019 Business Statistics L1
6/34
Then, in future econometrics courses, you will
learn to use statistical tools to
- analyze relationships of variables in the
economics context
- to do forecasting
-
7/27/2019 Business Statistics L1
7/34
Caveat:
- Statistics provide useful tools for manages to help them
in decision making.
- However, these tools are not intended as substitutes forthe familiarity with the business environment that
develops through years of study and accumulated
experience.
- It is in alliance with other relevant expertise in thebusiness environment that statistical methods have
proved most valuable as management tools.
-
7/27/2019 Business Statistics L1
8/34
Statistics are everywhere. Wherever they are
used, those who use them use them to speak
authoritatively.
Quite important to use the right statistic for
the job!
-
7/27/2019 Business Statistics L1
9/34
Data point, data point, data point,
Distribution of the data points
Characterize the shape of the distribution
- the center, usually the mean
- the spread, the variance
- the lopsidedness, the skewness
- (the "peakedness, the kurtosis)
-
7/27/2019 Business Statistics L1
10/34
Capturing the central tendency
After every exam, you will receive your own
score, and I will give you the average score of
the class, why do I assume that you are
interested in that average score?
-
7/27/2019 Business Statistics L1
11/34
Capturing the central tendency
Now suppose I ask you to poll your classmates about their opinions on
making the market economy the core of any countrys development
process. On a scale of 1 to 5, with 1 being strongly in favor of it and 5
being strongly against it.
It turns out that half of the class answered 1 and the other half of the class
answered 5. When I ask you to tell me the result, which is to summarize
the class opinion for me, would you add 1 and 5 and then divide theanswer by 2? Thats how you calculate the average. If you do so, you will
get 3, which indicates indifference. Would you report to me that the
general opinion in the class on this point is actually quite indifferent?
1 2 3 4 5
strongly agree agree indifferent disagree strongly disagree
-
7/27/2019 Business Statistics L1
12/34
Capturing the central tendency
Suppose you are dealing with manufacturers
who produce clothing in various sizes. Is
knowing the mean shirt size of European men
is 41.3 or that average shoe size of American
women is 8.24 useful?
-
7/27/2019 Business Statistics L1
13/34
Capturing the central tendency
Lets consider the incomes or wealth ofhouseholds in a city. Usually, a largeproportion of population has relatively
modest incomes, but the incomes of, say, thehighest 10% of all earners can be very large.
In such case, would you use mean income topresent the view of economic well-being inthe city?
-
7/27/2019 Business Statistics L1
14/34
Capturing the central tendency
The average or mean number is generallyappropriate to summarize datas central tendencywhen we have numerical data.
But with categorical data, such as opinion scales,mean is meaningless.
What is valuable for inventory decisions is not themean size, but the modal sizethe size of itemssold most oftenthat is the size in heaviestdemand.
-
7/27/2019 Business Statistics L1
15/34
Capturing the central tendency
But even with numerical data, mean can sometimesgive misleading information about the center.
In the case of income distribution, the mean income
can be inflated by the very wealthy. The existence ofthe very wealthy is also an illustration of outliers,numbers that are so far from the rest of the data.
Outliers (positive outliers) tend to increase mean butdoes not affect median. The median is preferred to themean in such case o describe the center position inincome distribution.
-
7/27/2019 Business Statistics L1
16/34
-
7/27/2019 Business Statistics L1
17/34
-
7/27/2019 Business Statistics L1
18/34
-
7/27/2019 Business Statistics L1
19/34
Please calculate the
mean, the median, and
the mode, and tell us
which statistic morereasonably captures the
central tendency of this
dataset?
-
7/27/2019 Business Statistics L1
20/34
How can we determine if the mean is being heavily influenced by outliers?
The simple answer is: dont just look at the mean, look at more statistics. If the
mean and the median are not close together, then the mean may be affected by
outliers, such as the case in this example.
-
7/27/2019 Business Statistics L1
21/34
Even if the mean and the median are equal in
a dataset, does it mean that we can use either
one to adequately capture the central
tendency?
-
7/27/2019 Business Statistics L1
22/34
Frequency
Salary
Mean salary = Median salary
The mean and median salaries are some of the least frequently
reported values. These salaries appear to be bimodal. Perhaps in
this case both staff and executive salaries have been collected.
Because there are two frequently occurring values, the mode
salary values may be the best way to summarize the dataset.
-
7/27/2019 Business Statistics L1
23/34
Running the numbers to get mean, median,
and mode is simply not sufficient.
Graph the data before deciding how best to
summarize a dataset.
-
7/27/2019 Business Statistics L1
24/34
Zhijiangsmigrant income in 2007
The intervals intowhich the data are
broken down are
called bins (or
classes).
The numbers ofobservations in
each class are called
frequencies.
A histogram is a
representation ofthe tabulated
frequencies over
specified bins.
-
7/27/2019 Business Statistics L1
25/34
You can tell quite a bit about a variable bylooking at a chart of its frequency distribution.
It is clear to see that the migrant income
histogram stretches out to the right, we callthis positively skewed. We can tell that themean is greater than the median in this case
Mean: 1234 yuan Median: 1000 yuan
Mode: 1000 yuan
-
7/27/2019 Business Statistics L1
26/34
A word on skewness
Skewness is the direction and relative magnitude the mean is pulled and the
direction the tail of a graphed dataset is pulled.
When the mean is pulled to higher values, we say there is a positive or right-
skewness.
When the mean is pulled to lower values, we say there is a negative or left-
skewness.
There is a type of distribution that has zero skewnessthat is the symmetric
distribution. With symmetric distribution, the mean is equal to the median.
-
7/27/2019 Business Statistics L1
27/34
The variability or spread
of a distribution
When we have two datasets with the same
mean, how can we tell which dataset is
More variable?
more volatile
less precise
less predictable
-
7/27/2019 Business Statistics L1
28/34
The variability or spread
of a distribution
The easiest way to think about the volatility of
a dataset:
Range of the dataset
-
7/27/2019 Business Statistics L1
29/34
-
7/27/2019 Business Statistics L1
30/34
The variability or spread
of a distribution
What about how far each point is from themean?
The dataset with the higher average distancefrom the mean should be more spread out orvariable?
We can express this idea using the followingformula: 1
( )i
xN
-
7/27/2019 Business Statistics L1
31/34
The variability or spread
of a distribution
But this formula always equals zero!
(TA session)
We must improve this formula slightly so thatdeviations on either side of the mean dont offset eachother in the aggregate. To get rid of the offsets, wecould either use absolute distance, or we can squarethe distances.
We choose the square the distances, thats easier todeal mathematically in many applications.
-
7/27/2019 Business Statistics L1
32/34
The variability or spread
of a distribution
Now we create mean squared deviations from
the mean.
We call the mean squared deviations from the
mean the statistical variance.
2 2
1
1( )
N
i
i
xN
-
7/27/2019 Business Statistics L1
33/34
The variability or spread
of a distribution
But along comes another problem. Variance ismeasured in units of data, squared.
Wouldnt it be better to use a spread statistic that isexpressed in the same units of the data being studied?
So we take the square root of the variance.
And this is called the standard deviation.
21 ( )i
xN
-
7/27/2019 Business Statistics L1
34/34
Look at two investment funds below:
MBA Student Fund A MBA Student Fund B
Average return
over 10 years
5% 5%
Median return 7% 2%
Standard
deviation
10% 1%
In which fund would you invest?