bea140leon jiang, university of tasmania1 lecture 3 summer semester 2009 bea 140 by leon jiang

BEA140 Leon Jiang, University of Tasmania 1

Lecture 3Summer Semester

2009BEA 140

By Leon Jiang


Some points more for univariate data


Central tendency

Mean Median Mode


Variance Population: 2 = ( Xi

2 - (Xi)2/N ) / N

Sample: s2 = ( Xi2 - (Xi)2/n ) / (n-1)


Standard deviation s2 = ( Xi

2 - (Xi)2/n ) / (n-1)

1]/)([ 22

n

nXXs ii


The meaning of Stdv. “ For most data batches around two thirds

( or 68%) of the data will fall within one standard deviation of the mean, and around 95% within two standard deviations of the mean.”

- empirical rule - rule of thumb


MEASURING FROM GROUPED DATA


Measuring For Grouped Data When no raw data but only secondary source of

data available, we have to analyze this secondary set of data, which has been grouped for reporting purposes.

A set of grouped data is not like a set of raw data in that the information in it has already been grouped arbitrarily.

A set of grouped data is subjective or at least it is not so objective as raw data, therefore small errors exist.


Generally we use a frequency distribution table to show the grouping of data

Time number of calls class mark cum. Freq.

fj xj fjxj fjxj2

1<=X<3 11 2 22 44 113<=X<5 19 4 76 304 305<=X<7 10 6 60 360 407<=X<9 9 8 72 576 49

9<=X<11 2 10 20 200 5111<=X<13 1 12 12 144 5213<=X<15 1 14 14 196 5315<=X<17 0 16 0 0 5317<=X<19 1 18 18 324 5419<=X<21 0 20 0 0 54

54 294 2148


Class mark for frequency distribution of grouped data

Class mark , Xj is a representative value of all observations located in the class.

A class mark is determined by the largest value and the smallest value in the class.

Xj = ( largest value + smallest value ) / 2

Xj = (RUCL + RLCL) / 2 Where, RUCL => the largest value ; RLCL => the smallest

value


Central tendency for grouped data

Mean of g.d (grouped data) is defined as the weighted sum of class marks, with class frequencies as weights. i.e.

X(mean) = (Σfj xj ) / n

X ( mean ) = 294/54=5.44


Median for g.d

1. Locating the median class :- the class containing the median.- But how and where?

- Total number of calls in the frequency distribution is 54 (=> even number).

- and therefore, according to the formula of median ( median = n + 1 / 2 ), the median ought to be the 27.5th value.

- The class containing the 27.5th value is the median class.


FORMULA FOR MD:

MD = LCL + class width * ( how far into class ) / (how many in class )

3.0 + 2 * (27.5 - 11) / 19


MD = LCL + class width * ( how far into class ) / (how many in class ) 3.0 + 2 * (27.5 - 11) / 19

Time number of calls class mark cum. Freq.

fj xj fjxj fjxj2

1<=X<3 11 2 22 44 113<=X<5 19 4 76 304 305<=X<7 10 6 60 360 407<=X<9 9 8 72 576 49

9<=X<11 2 10 20 200 5111<=X<13 1 12 12 144 5213<=X<15 1 14 14 196 5315<=X<17 0 16 0 0 5317<=X<19 1 18 18 324 5419<=X<21 0 20 0 0 54

54 294 2148


Small errors likely exist most of the time

Median from raw data = 4.4

Median from grouped data = 4.47


An example: MD = LCL + class width * ( how far into class ) / (how many in class )

Class Freq. cumu. Freq.80 &U 90 1 1

90 &U 100 2 3100 &U 110 6 9110 &U 120 3 12120 &U 130 2 14130 &U 140 2 16

16


LCL + class width * (how far into the class) / how many in the class

100 + 10 * (8.5 – 3) / (9 – 3)

Median = 109.17


Mode for g.d.

With grouped data, we tend to talk more of a modal class – the class (classes) with the highest frequency rather than the mode.

But, if asked for a mode with grouped data, the best we can do is to tell the class mark of modal class as follows:

Modal class: 3 &U 5 ( 19 observations ) Mode : 4 ( class mark of modal class )


Dispersion ( variance ) for grouped data The sample variance formula is : S2 ={Σfj Xj2 – (Σfj Xj)2 / n }/ (n-1)

The population variance formula is :

= {Σfj Xj2 – (Σfj Xj)2 / N }/ N

Standard deviation = or

2

2s 2


Preparing a table to help work out S.d.Class Freq. class mark cumu. Freq.

f j x j f jXj f jXjsquare80 &U 90 1 85 85 7225 1

90 &U 100 2 95 190 18050 3100 &U 110 6 105 630 66150 9110 &U 120 3 115 345 39675 12120 &U 130 2 125 250 31250 14130 &U 140 2 135 270 36450 16

16 660 1770 198800


Working out the standard deviation for the example~!

S2 ={Σfj Xj2 – (Σfj Xj)2 / n }/ (n-1)

Standard deviation =

S = 14.14

Mean = 1770 / 16 = 110.625

2s


Shape Skewness – relates to symmetry of

distribution.

Positively skewed or right skewed: tail extends to right , mean > Median > Mode

Negatively skewed or left skewed: tail extends to left, mean < median < mode


Standard scores

The standard score expresses any observation in terms of the number of standard deviation it is from the mean.

t score ( for sample)

* z score (for population)

Xz

sXXt


Interpretation of standard score Mean 5, standard deviation 2, for a sample

t score for 8 = (8-5)/2=1.5

Interpretation: the observation is 1.5 standard deviations above the sample mean.


Bivariate VariablesSummary measures


Bivariate variables

In the previous parts, we were all the time talking about a single numerical variable such as the rate of return of mutual funds.

From this lecture, we shall start to study two variables with correlation.


Two numerical variables A case: In a call center, operators were trained to receive phone

calls. However, the duration of calls shows a significant difference from one another. The shorter the duration of a call, the more efficient an operator proves to be.

Suppose, the call center manager wants to know if the training hours the operators received have any correlation to the duration of those phone calls the operators handled.

The data pooled down are as the follows: X Training hours Y Duration minutes


Data pooled like this X (training hours): 6.5 7.5 6 8.5 5.5 3.5 8.5 8 8 7 8.5

9.5 Y (duration mins): 6.2 2.9 9.2 3.2 8.9 13.6 2.5 4.2 4.3 3.1 3.4

2.7 X (training hours): …………………………………………………….Y (duration mins): …………………………………………………….

Anyway, in total there have been 54 phone calls in this set of data being studied.

* Now, what we are about to find out is to know whether these two variables ( X training hours of operators ; Y duration minutes of calls) show any real correlation. Or , by putting it simply, the call center manager wants to know if the more training hours the operators receive, the shorter the duration of calls the operators handle will be.


Setting up a scatter diagram for the data here ~! A scatter diagram ( scattergram ) between two variables will

indicate the form, type and strength of the relation.

Form – whether linear or non-linear

Type – direct (positive) or inverse (negative)

Strength – how closely data are co-ordinated, e.g. if linear, how close ordered pairs are to a line describing their relationship. This is indicated by a correlation measure.


(Pearson’s) Coefficient of Correlation This is a summary measure that describes the form, type and

strength of a scattergram. The range of r is between –1 , 0 , 1. -1: perfect negative relationship – all points exactly on a negative

sloping line 0: no linear correlation 1: perfect positive relationship

nYnX

nYXXYr

YX /)(/)(

/))((2222


Back to the case study

r( Pearson’s coefficient of correlation) = - 0.9209 This means X and Y have a very strong negative

linear relationship.

Or , let’s say the training hours the operators received really show a strong negative relationship with the duration of calls they handled.


In-depth analysis of this linear relationship – linear regression

Determining the Coefficient of Correlation is concerned with summarizing the form, type and strength of the relationship between two variables.

The motivation for regression is the desire to quantify the relationship, often for the purposes of using the knowledge of one variable to predict the other.

Say , using one variable ( X ) to predict the other variable ( Y ).


The regression line is mathematically expressed by this equation

Yc = a + bX Yc is the computed value of Y.

a is the sample regression constant, or Y-intercept.

b is the sample regression coefficient, or slope of the line.


Least squares method

This is a mathematical technique that determines what values of a and b minimize the sum of squared differences. Any values for a and b other than those determined by the least-squares method result in a greater sum of squared differences between the actual value of Y and the predicted value of Y.

Simply put, least-squares method is used to find a line of best fit for two correlated variables.


Working out the linear regression ~!

Residual is defined as the vertical distance between the actual value and the predicted value ( the point on the line of best fit).

In least-squares regression, we find the values of a and b, such that sum of squares of residuals, is a minimum.

Actual pairs : (X1, Y1), (X2, Y2),… ... Predicted (calculated )pairs: (X1, Yc1), (X1,Yc2), … …


Back to the case study~! Since we have known that the training hours correlate to the

duration of calls. It is somehow to say : if we know the training hours an operator received , in some sense we can predict how many minutes , on average, he or she should take to handle a phone call.

Or, in linear regression, we know X and by using the least squares method, we can calculate out Y.


Solutions for a & b

Two formulae respectively for a and b.

22 )( XXnYXXYn

b

nXbY

a


Establishing a table to work out linear regression

Xi Yi Xi 2 XiYi Yi 2

6.5 6. 2 42. 25 40.3 38. 447. 5 2. 9 … … …6 9.2… …… …… …8.5 2.87.5 5.96 6. 5 36 39 42.25

391.5 290.7 2974.25 1863.55 2081.69


Outcomes ~!

b=-1.79595 a=18.40399. Then Yc=18.404 –1.796X This is the linear regression.

Interpretation : for each extra hour of training, there is an associated decrease of 1.796 minutes in call duration.


One consideration~! Note: regression says nothing about causation, only about

association~! This means X does not necessarily cause a change in Y. Or, the training hours do not necessarily change the duration of

calls, instead they have correlation.

Think about : does smoking cigarettes cause life expectancy shorter?

Not really~! ?


The standard error of the estimate

Standard error measures how well actual Y and computed Y are matched – the smaller Se, the better the match and predictive accuracy.

2)( 2

2

nYcY

Se

2SeSe


Note! Standard error is very similar to standard

deviation.

Standard error is for bivariate, whilst standard deviation is for univariate.


Computational form for Se. You can use this computational form to find

out Se.

2

22

n

XYbYaYSe


Coefficient of determination

Total variation = SST=

Explained variation = SSR

Unexplained variation = SSE=

Coefficient of determination =SSR / SST=

nYY /)( 22

XYbYaY 2

SSTSSE

1


Coefficient of determination - The Coefficient of determination by calculation

turned out to be 0.848

This means 85% of total variation in call duration (around the average duration level) has been explained by a linear relation between duration and training hours.

2r


We just saw summery measures for dealing with two numerical variables.

What about ordinal data?


Two ordinal variables A scattergram can also be used to illustrate a

possible relationship between two ordinal variables.

We often have ordinal variables in fields such as Marketing and Management where people have been asked to rank some attribute.

An example could be a series of taste trials carried out during product development, such as the example below, where a panel was asked to rank soft drinks by “Refreshing ness” and “Sweetness”.


Understanding this example This example illustrates which one of the

drinks is the most refreshing and which is the second most refreshing …

Likewise, which is the sweetest and which is the second sweetest …


Drink Refresh Rank SweetnessRank

Slurp 1 8Fizz 2 7Fizz Plus 5 10Binge 6 9Slam 3 5Dunk 4 6Whizz 10 2Pling 9 3Tweak 7 1Blitz 8 4


Sweetness vs Refreshingness

0

2

4

6

8

10

12

0 2 4 6 8 10 12

Refreshing Rank

Swee

tnes

s R

ank


Spearman’s Rank Correlation Coefficient Spearman’s Rank CC, can be used as a summary measure

to gauge the degree of relationship between two ordinal variables.

Spearman’s Rank C.C. is given the symbol rs for sample

data, (and s for population data) It is usually calculated using the following short cut formula:

is the Greek letter ‘rho’ - (the Greek equivalent to ‘r’).


Where di is the difference between the ranks of the ith pair of observations, and n is the number of pairs of observations.

1nn

d61r 2

n

1i

2i

s


Notes to this short formula Strictly speaking this formula only works

when the number of ties is relatively small. If more than about 1/4 to 1/3 of the observations of a variable are in ties then the shortcut formula starts to get unreliable. We will deal with ties later. When there are too many ties we need to use the “long” formula


What are ties?


Dealing with ties: we allocate the average rank of all observations involved in the tie, to each observation involved in the tie.

Standard & Poor’s bond ratings for a random sample of 12 bonds:

C BB A AA A BBB CC D B A AA AAA


C BB A AA A BBB CC D B A AA AAA

AAA AA AA A A A BBB BB B CC C D

1 2 3 4 5 6 7 8 9 10 11 12

AAA AA AA A A A BBB BB B CC C D

1 2.5 2.5 5 5 5 7 8 9 10 11 12


Two people came equal third (that is, the next person came fifth). These share the 3rd & 4th positions and thus each is given a rank of 3.5.

placing 1.0 2.0 3* 3* 5.0 6.0 7.0ranking 1.0 2.0 3.5 3.5 5.0 6.0 7.0


Rankings with ties When rankings involve ties they provide us

with two extra problems: how to deal with the ties the short cut formula may be unreliable

if there are too many ties, and we need to use a longer formula –


n

i

n

iii

n

i

n

iii

n

i

n

i

n

iiiii

s

YYnXXn

YXYXnr

1 1

22

1 1

22

1 1 1

)()(

))((

The Full Spearman formula- Use when there are ties!


Example - using the “short cut” formula Drink Refresh Rank Sweetness

Rankdi di

2

Slurp 1 8 -7 49Fizz 2 7 -5 25Fizz Plus 5 10 -5 25Binge 6 9 -3 9Slam 3 5 -2 4Dunk 4 6 -2 4Whizz 10 2 8 64Pling 9 3 6 36Tweak 7 1 6 36Blitz 8 4 4 16

268


Result ! rs = 1 - 6*268 / (10*99) = - 0.624

Indicating quite a strong negative relationship between refreshingness and sweetness, (as we saw in the scattergram).


Example – using the “long” formula

A students association’s satisfaction ratings for 8 courses, and the seniority of the person taking the course are listed below. Use Spearman’s Rank C.C. to investigate the relationship between the two.


End of Module 2We are getting in Module 3 !


Module 3Probability & Probability Distributions


Probability What is meant by the word – probability?

Probability is the likelihood or chance that a particular event will occur.

Three approaches to probability

1. A priori classical probability2. Empirical classical probability

3. Subjective probability


A priori classical probability

The probability of success is based on prior knowledge of the process involved.

Probability of occurrence


X=number of ways in which the event occurs T=total number of elementary outcomes

TX


Example for priori classical probability A box of 20 chocolate beans, among which 10

are red-colored and the other 10 are green-colored.

The probability of selecting a piece of red-colored bean each time is 0.5 , or say : 10 / 20.

Because we know the total number of beans and also the proportion of the two different colored beans in advance, that’s why we call it – “ priori probability ”


Empirical classical probability

Empirical classical probability adopts the same formula to calculate the probability of occurrence.


However, in empirical classical probability, probability of success is based on observed data instead of pre-known data (priori).

TX


Example for empirical classical probability Your mid-term exam is coming and this exam is said to be

optional, which means you can choose to take the exam or not.

If we take a poll asking how many students are to attend the exam and 99% of students are to attend the exam, we say here, there is a 0.99 probability that an individual student will attend the exam.

Remember, in this example, we did not know how many students wished to take the exam. And this is different from the priori classical example, in which we already knew 50% were red and 50% were green.

So, empirical probability actually is based on more randomness.


Subjective probability

From the name we can infer that this approach to probability is based on people’s personal decision.

For instance: You think you have a probability of 90% to pass

CPA exam and your supervisor thinks your probability to pass it can be 60%.

Both of the probabilities are based on personal judgment and experience, but not on objectiveness.


Sample spaces and events Event : Each possible type of occurrence is referred to

as an event.

Simple event A simple event can be described by a single

characteristic.

Sample space: The collection of all the possible events is called

the sample space.


Axioms about probability

Given a sample space: S={E1+E2+… + En}, the probabilities assigned to Ei must satisfy:

0 Ei 1, for each I≦ ≦ If an event has no chance to occur, the probability is 0 and if

an event is definite to occur, the probability is 1.

P(E1) + P(E2) +…P(En) = ∑P(Ei) = 1 Probability of Event A = sum of probabilities of simple events

comprising A.


Contingency tables

By example:

Intent to purchase investigation This kind of investigation often takes place in sal

es and marketing research scenario.

In this example : the sample space is 1,000 households in terms of purchase behavior for laptop computer.


In the investigation, there are basically two intents to the purchase.

Sub-samples1. Planned to purchase – 300 households2. Not planned to purchase – 700

households

So, after the purchase behaviors happened, we can further subdivide the sample of 1,000 households into :

1. Actually purchased2. Not purchased


Now, in this example, of the big sample of 1,000, we can have four different sub-samples:

1. Planned to purchase 2. Not planned to purchase

3. Purchased 4. Not purchased


but, latter, the outcomes of actual purchase and no purchase turned out to be not that consistent with the original investigated intents.

In the first category ( planned to purchase – 300 households), 200 out of 300 actually purchased and the remaining 100 did not.

In the second category ( not planned to purchase – 700 households ), 50 out of 700 actually purchased, the remaining 650 was consistent with their initial intent.


Complement and joint event

The complement of event A includes all events that are not part of event A. The complement of A is given by the symbol A’ or .

In the above example, 300 planned to purchase is the complement of 700 not planned to purchase.

Joint event: A joint event is an event that has two or more

characteristics. In the above example, the event “ planned to purchase and

actually purchased” is a joint event.

A


Usually two ways to depict events in sample

Contingency table - also called “ table of cross-classification ”

Now, based on the above example, we learn to construct this contingency table and Venn diagram.


Contingency table

Yes No Total200 50 250100 650 750300 700 1000

Planned to purchase Actually Purchased

YesNo

Total


Terms

Intersection A∩B: both A and B occur together, the joint

event. ( sometimes simply written as AB)

Union A B: either A or B or both.∪ Other common forms of notation include A B , ∨

A+B, A OR B


Example of using the above two notations

Number (n) of cards that is a Heart or an ace in a set of poker cards (52 cards).

n(H A) = n(H) + n(A) - n(H∩A)∪ = 13 + 4 - 1 = 16


Complement Complement - A’: event A does not occur, or another form :

NOT A. Example: Non-hearts = H’, n(H’) = 39 Complement rule: P(A) = 1- P(A’)


Mutually exclusive and collectively exhaustive Mutually exclusive: occurrence of one event

precludes occurrence of another. If A and B are mutually exclusive, then n(A∩B) = 0.

Collectively exhaustive: Events together comprise the sample space; at least

one event is certain to occur. Example: number of female students number of ∪

male students = 26 ( QM course ).


More to understand mutually exclusive and collectively exhaustive

For being female or male, everyone only can be one or the other ( collectively exhaustive) , but no one is both ( mutually exclusive).

Being female or male are mutually exclusive and collectively exhaustive events.

In the example of TV set purchase: Planned to purchase or not planned to purchase. Everyone

only can plan to purchase or not (collectively exhaustive), but no one is both “planned to purchase” and “not planned to purchase ” (mutually exclusive).


Probability contingency table

numbers M M' TotalO 7 14 21O' 24 35 59

Total 31 49 80

numbers M M' TotalO 0. 0875 0. 175 0. 2625O' 0. 3 0. 4375 0. 7375

Total 0. 3875 0. 6125 1


General form of a 2×2 contingency table

Probabilities A A' TotalB P(A∩ B) P(A'∩ B) P(B)B' P(A∩ B') P(A'∩ B') P(B')

Total P(A) P(A') 1


Simple (marginal) probability : P(A)

The most fundamental rule for probabilities is that they range from 0 to 1.

Simple (marginal) probability refers to the probability of occurrence of a simple event. P(A).

Example: what is the probability that a red-heart card is selected in a set of poker cards?

P(red-card) = 13 / 52 = 0.25


Joint probability : P(A∩B)

Joint probability refers to situations involving two or more events, such as the probability of planned to purchase and actually purchased in the big-screen TV set purchase example.

Joint probability means that both event A and B must occur simultaneously.

So, P(planned ∩purchased ) = 200/1000 = 0.2


Computing marginal probability

In fact, the marginal probability of an event consists of a set of joint probabilities.

The formula: P(A) = P(A and B1) + P(A and B2) + … + P(A and Bk)

In the previous example: P(planned to purchase) = P(planned to purchase and

purchased) + P(planned to purchase + did not purchase) = 200/1000 + 100/1000 =0.30


Addition rule

P(A B) = P(A) + P(B) – P(A∩B)∪ N.B. If A, B are mutually exclusive, then P(A∩B) = 0, and

P(A B) = P(A) + P(B) ∪


Conditional probability

To spot conditional probabilities, we notice those words like “of ”, “ if ” and “given”. Suppose :

D = “part is defective”, and B = “part was produced by B”, the following would tell you P(D|B):

- If a part was produced by B, there is a 5% chance it is defective.

- 5% of the parts produced by B are defective.- There is a 5% chance that a part is defective, given that it

was produced by B.


Back to the TV purchase example

P(actually purchased | planned to purchase) = planned to purchase and actually purchased planned to purchase= 200 / 250 = 0.80 P(B|A) = P(A and B) / P(A)

Here: A = planned to purchase B = Actually purchased


Independence

Two events, A and B, are independent, if the probability of A occurring is not affected by B and vice versa.

A, B independent if :

P(A) = P(A|B) , P(B) = P(B|A)

P(AB) = P(A)P(B) only if A and B are independent.


Bayes’ Theorem Bayes’ rule is useful in decision analysis. Let’s learn it through an example as follows:

A machine is known to be in good condition 90% of the time. If in good condition, only 1% of output is defective. If in poor condition, 10% of output is defective.

An item of output is observed to be defective. Given this information what is the probability that the machine is in good condition?


Solution

G: condition of machine is good. D: an item of output is defective.

Probabilities: Prior (pre-condition) : P(G) = 0.9, P(G’) = 0.1 Conditional : P(D|G) = 0.01, P(D|G’) = 0.10

P(G|D) = P(D|G)*P(G) / P(D) - conditional probability

But, we need to find out P(D).


P(defect) = P(defect and good condition) + P(defect and poor condition)

P(G∩D) + P(G’ ∩D) P(D) = P(D|G)P(G) + P(D|G’)P(G’) = 0.01*0.9 + 0.10*0.1 = 0.019

Then : P(G|D) = 0.009 / 0.019 = 0.47


Expression of Bayes’ Rule

P(A|B) = P(B|A)P(A) / P(B)

This actually is the formula for joint probability.


Counting Rule 1

If any one of k different mutually exclusive and collectively exhaustive events can occur on each of n trials, the number of possible outcomes is equal to

nk


Example for counting rule 1

A coin ( two sides) tossed 10 times, the number of outcomes is

024,1210


Counting Rule 2

If there are K1 events on the first trial, K2 events on the second trial, … and Kn events on the nth trial, then the number of possible outcomes is

(k1) (k2) … (Kn)



A license plate consists of 3 letters (26 letters in total, a,b,c…z) followed by 3 digits ( 1 – 10), the possible outcomes are:

26× 26×26 ×10 ×10 ×10 = 17,576,000


Counting Rule 3

The number of ways that n objects can be arranged in order is:

n!=(n)(n-1)…(1) 0!=1

“!” reads “factorial”. “n!” is read “n factorial”.



The number of ways that 6 books can be arranged is:

n!=6!= 6*5*4*3*2*1=720


Counting Rule 4

Permutations: the number of ways of arranging X objects selected from n objects in order is:

Permutation: each possible arrangement is called permutation.

)!(!Xnn



The number of ordered arrangements of 4 books selected from 6 books is :

360)!46(

!6)!(

!

Xnn


Counting Rule 5

Combinations: the number of ways of selecting X objects out of n objects, irrespective of order, is :

)!(!!XnX

nn

X


Example for counting rule 5 – also called rule of combinations

4 books out of 6 books, the number of arrangements is ( note: irrelevant to order):

= 15

)!(!!XnX

nn

X

bea140leon jiang, university of tasmania1 lecture 3 summer semester 2009 bea 140 by leon jiang

Documents