business statistics l1

7/27/2019 Business Statistics L1

1/34

Business Statistics

Fall, 2013Introduction


2/34

Lecture outline

What is statistics?

Summarizing the distribution of data

- capturing the central tendency- spread


3/34

The word statistics originally meant the

collection of information about and for the

state.

It is now a scientific method of collecting and

analyzing data (making sense of

numerical/quantitative information) to assist

in making more effective decisions.


4/34

In statistics, we deal with uncertainty. We dont dealwith What is but of What probably is.

But what do we mean by it is probably that?

Language alone is inadequate to illustrate the degreeof uncertainty, we need more formal structure for thispurpose.

The language of probability will be the focus of the firstpart of this course.


5/34

Also, in statistics, we deal with samples. We

make statements about a population based on

the results of a sample. This is the focus of the

second part of this course.

Beware, some uncertainty will always remain.


6/34

Then, in future econometrics courses, you will

learn to use statistical tools to

- analyze relationships of variables in the

economics context

- to do forecasting


7/34

Caveat:

- Statistics provide useful tools for manages to help them

in decision making.

- However, these tools are not intended as substitutes forthe familiarity with the business environment that

develops through years of study and accumulated

experience.

- It is in alliance with other relevant expertise in thebusiness environment that statistical methods have

proved most valuable as management tools.


8/34

Statistics are everywhere. Wherever they are

used, those who use them use them to speak

authoritatively.

Quite important to use the right statistic for

the job!


9/34

Data point, data point, data point,

Distribution of the data points

Characterize the shape of the distribution

- the center, usually the mean

- the spread, the variance

- the lopsidedness, the skewness

- (the "peakedness, the kurtosis)


10/34

Capturing the central tendency

After every exam, you will receive your own

score, and I will give you the average score of

the class, why do I assume that you are

interested in that average score?


11/34


Now suppose I ask you to poll your classmates about their opinions on

making the market economy the core of any countrys development

process. On a scale of 1 to 5, with 1 being strongly in favor of it and 5

being strongly against it.

It turns out that half of the class answered 1 and the other half of the class

answered 5. When I ask you to tell me the result, which is to summarize

the class opinion for me, would you add 1 and 5 and then divide theanswer by 2? Thats how you calculate the average. If you do so, you will

get 3, which indicates indifference. Would you report to me that the

general opinion in the class on this point is actually quite indifferent?

1 2 3 4 5

strongly agree agree indifferent disagree strongly disagree


12/34


Suppose you are dealing with manufacturers

who produce clothing in various sizes. Is

knowing the mean shirt size of European men

is 41.3 or that average shoe size of American

women is 8.24 useful?


13/34


Lets consider the incomes or wealth ofhouseholds in a city. Usually, a largeproportion of population has relatively

modest incomes, but the incomes of, say, thehighest 10% of all earners can be very large.

In such case, would you use mean income topresent the view of economic well-being inthe city?


14/34


The average or mean number is generallyappropriate to summarize datas central tendencywhen we have numerical data.

But with categorical data, such as opinion scales,mean is meaningless.

What is valuable for inventory decisions is not themean size, but the modal sizethe size of itemssold most oftenthat is the size in heaviestdemand.


15/34


But even with numerical data, mean can sometimesgive misleading information about the center.

In the case of income distribution, the mean income

can be inflated by the very wealthy. The existence ofthe very wealthy is also an illustration of outliers,numbers that are so far from the rest of the data.

Outliers (positive outliers) tend to increase mean butdoes not affect median. The median is preferred to themean in such case o describe the center position inincome distribution.


16/34


17/34


18/34


19/34

Please calculate the

mean, the median, and

the mode, and tell us

which statistic morereasonably captures the

central tendency of this

dataset?


20/34

How can we determine if the mean is being heavily influenced by outliers?

The simple answer is: dont just look at the mean, look at more statistics. If the

mean and the median are not close together, then the mean may be affected by

outliers, such as the case in this example.


21/34

Even if the mean and the median are equal in

a dataset, does it mean that we can use either

one to adequately capture the central

tendency?


22/34

Frequency

Salary

Mean salary = Median salary

The mean and median salaries are some of the least frequently

reported values. These salaries appear to be bimodal. Perhaps in

this case both staff and executive salaries have been collected.

Because there are two frequently occurring values, the mode

salary values may be the best way to summarize the dataset.


23/34

Running the numbers to get mean, median,

and mode is simply not sufficient.

Graph the data before deciding how best to

summarize a dataset.


24/34

Zhijiangsmigrant income in 2007

The intervals intowhich the data are

broken down are

called bins (or

classes).

The numbers ofobservations in

each class are called

frequencies.

A histogram is a

representation ofthe tabulated

frequencies over

specified bins.


25/34

You can tell quite a bit about a variable bylooking at a chart of its frequency distribution.

It is clear to see that the migrant income

histogram stretches out to the right, we callthis positively skewed. We can tell that themean is greater than the median in this case

Mean: 1234 yuan Median: 1000 yuan

Mode: 1000 yuan


26/34

A word on skewness

Skewness is the direction and relative magnitude the mean is pulled and the

direction the tail of a graphed dataset is pulled.

When the mean is pulled to higher values, we say there is a positive or right-

skewness.

When the mean is pulled to lower values, we say there is a negative or left-

skewness.

There is a type of distribution that has zero skewnessthat is the symmetric

distribution. With symmetric distribution, the mean is equal to the median.


27/34

The variability or spread

of a distribution

When we have two datasets with the same

mean, how can we tell which dataset is

More variable?

more volatile

less precise

less predictable


28/34


of a distribution

The easiest way to think about the volatility of

a dataset:

Range of the dataset


29/34


30/34


of a distribution

What about how far each point is from themean?

The dataset with the higher average distancefrom the mean should be more spread out orvariable?

We can express this idea using the followingformula: 1

( )i

xN


31/34


of a distribution

But this formula always equals zero!

(TA session)

We must improve this formula slightly so thatdeviations on either side of the mean dont offset eachother in the aggregate. To get rid of the offsets, wecould either use absolute distance, or we can squarethe distances.

We choose the square the distances, thats easier todeal mathematically in many applications.


32/34


of a distribution

Now we create mean squared deviations from

the mean.

We call the mean squared deviations from the

mean the statistical variance.

2 2

1

1( )

N

i

i

xN


33/34


of a distribution

But along comes another problem. Variance ismeasured in units of data, squared.

Wouldnt it be better to use a spread statistic that isexpressed in the same units of the data being studied?

So we take the square root of the variance.

And this is called the standard deviation.

21 ( )i

xN


34/34

Look at two investment funds below:

MBA Student Fund A MBA Student Fund B

Average return

over 10 years

5% 5%

Median return 7% 2%

Standard

deviation

10% 1%

In which fund would you invest?

business statistics l1

Documents