bea140leon jiang, university of tasmania1 lecture 3 summer semester 2009 bea 140 by leon jiang
DESCRIPTION
BEA140Leon Jiang, University of Tasmania3 Central tendency Mean Median ModeTRANSCRIPT
BEA140 Leon Jiang, University of Tasmania 1
Lecture 3Summer Semester
2009BEA 140
By Leon Jiang
BEA140 Leon Jiang, University of Tasmania 2
Some points more for univariate data
BEA140 Leon Jiang, University of Tasmania 3
Central tendency
Mean Median Mode
BEA140 Leon Jiang, University of Tasmania 4
Variance Population: 2 = ( Xi
2 - (Xi)2/N ) / N
Sample: s2 = ( Xi2 - (Xi)2/n ) / (n-1)
BEA140 Leon Jiang, University of Tasmania 5
Standard deviation s2 = ( Xi
2 - (Xi)2/n ) / (n-1)
1]/)([ 22
n
nXXs ii
BEA140 Leon Jiang, University of Tasmania 6
The meaning of Stdv. “ For most data batches around two thirds
( or 68%) of the data will fall within one standard deviation of the mean, and around 95% within two standard deviations of the mean.”
- empirical rule - rule of thumb
BEA140 Leon Jiang, University of Tasmania 7
MEASURING FROM GROUPED DATA
BEA140 Leon Jiang, University of Tasmania 8
Measuring For Grouped Data When no raw data but only secondary source of
data available, we have to analyze this secondary set of data, which has been grouped for reporting purposes.
A set of grouped data is not like a set of raw data in that the information in it has already been grouped arbitrarily.
A set of grouped data is subjective or at least it is not so objective as raw data, therefore small errors exist.
BEA140 Leon Jiang, University of Tasmania 9
Generally we use a frequency distribution table to show the grouping of data
Time number of calls class mark cum. Freq.
fj xj fjxj fjxj2
1<=X<3 11 2 22 44 113<=X<5 19 4 76 304 305<=X<7 10 6 60 360 407<=X<9 9 8 72 576 49
9<=X<11 2 10 20 200 5111<=X<13 1 12 12 144 5213<=X<15 1 14 14 196 5315<=X<17 0 16 0 0 5317<=X<19 1 18 18 324 5419<=X<21 0 20 0 0 54
54 294 2148
BEA140 Leon Jiang, University of Tasmania 10
Class mark for frequency distribution of grouped data
Class mark , Xj is a representative value of all observations located in the class.
A class mark is determined by the largest value and the smallest value in the class.
Xj = ( largest value + smallest value ) / 2
Xj = (RUCL + RLCL) / 2 Where, RUCL => the largest value ; RLCL => the smallest
value
BEA140 Leon Jiang, University of Tasmania 11
Central tendency for grouped data
Mean of g.d (grouped data) is defined as the weighted sum of class marks, with class frequencies as weights. i.e.
X(mean) = (Σfj xj ) / n
X ( mean ) = 294/54=5.44
BEA140 Leon Jiang, University of Tasmania 12
Median for g.d
1. Locating the median class :- the class containing the median.- But how and where?
- Total number of calls in the frequency distribution is 54 (=> even number).
- and therefore, according to the formula of median ( median = n + 1 / 2 ), the median ought to be the 27.5th value.
- The class containing the 27.5th value is the median class.
BEA140 Leon Jiang, University of Tasmania 13
FORMULA FOR MD:
MD = LCL + class width * ( how far into class ) / (how many in class )
3.0 + 2 * (27.5 - 11) / 19
BEA140 Leon Jiang, University of Tasmania 14
MD = LCL + class width * ( how far into class ) / (how many in class ) 3.0 + 2 * (27.5 - 11) / 19
Time number of calls class mark cum. Freq.
fj xj fjxj fjxj2
1<=X<3 11 2 22 44 113<=X<5 19 4 76 304 305<=X<7 10 6 60 360 407<=X<9 9 8 72 576 49
9<=X<11 2 10 20 200 5111<=X<13 1 12 12 144 5213<=X<15 1 14 14 196 5315<=X<17 0 16 0 0 5317<=X<19 1 18 18 324 5419<=X<21 0 20 0 0 54
54 294 2148
BEA140 Leon Jiang, University of Tasmania 15
Small errors likely exist most of the time
Median from raw data = 4.4
Median from grouped data = 4.47
BEA140 Leon Jiang, University of Tasmania 16
An example: MD = LCL + class width * ( how far into class ) / (how many in class )
Class Freq. cumu. Freq.80 &U 90 1 1
90 &U 100 2 3100 &U 110 6 9110 &U 120 3 12120 &U 130 2 14130 &U 140 2 16
16
BEA140 Leon Jiang, University of Tasmania 17
LCL + class width * (how far into the class) / how many in the class
100 + 10 * (8.5 – 3) / (9 – 3)
Median = 109.17
BEA140 Leon Jiang, University of Tasmania 18
Mode for g.d.
With grouped data, we tend to talk more of a modal class – the class (classes) with the highest frequency rather than the mode.
But, if asked for a mode with grouped data, the best we can do is to tell the class mark of modal class as follows:
Modal class: 3 &U 5 ( 19 observations ) Mode : 4 ( class mark of modal class )
BEA140 Leon Jiang, University of Tasmania 19
Dispersion ( variance ) for grouped data The sample variance formula is : S2 ={Σfj Xj2 – (Σfj Xj)2 / n }/ (n-1)
The population variance formula is :
= {Σfj Xj2 – (Σfj Xj)2 / N }/ N
Standard deviation = or
2
2s 2
BEA140 Leon Jiang, University of Tasmania 20
Preparing a table to help work out S.d.Class Freq. class mark cumu. Freq.
f j x j f jXj f jXjsquare80 &U 90 1 85 85 7225 1
90 &U 100 2 95 190 18050 3100 &U 110 6 105 630 66150 9110 &U 120 3 115 345 39675 12120 &U 130 2 125 250 31250 14130 &U 140 2 135 270 36450 16
16 660 1770 198800
BEA140 Leon Jiang, University of Tasmania 21
Working out the standard deviation for the example~!
S2 ={Σfj Xj2 – (Σfj Xj)2 / n }/ (n-1)
Standard deviation =
S = 14.14
Mean = 1770 / 16 = 110.625
2s
BEA140 Leon Jiang, University of Tasmania 22
Shape Skewness – relates to symmetry of
distribution.
Positively skewed or right skewed: tail extends to right , mean > Median > Mode
Negatively skewed or left skewed: tail extends to left, mean < median < mode
BEA140 Leon Jiang, University of Tasmania 23
Standard scores
The standard score expresses any observation in terms of the number of standard deviation it is from the mean.
t score ( for sample)
* z score (for population)
Xz
sXXt
BEA140 Leon Jiang, University of Tasmania 24
Interpretation of standard score Mean 5, standard deviation 2, for a sample
t score for 8 = (8-5)/2=1.5
Interpretation: the observation is 1.5 standard deviations above the sample mean.
BEA140 Leon Jiang, University of Tasmania 25
Bivariate VariablesSummary measures
BEA140 Leon Jiang, University of Tasmania 26
Bivariate variables
In the previous parts, we were all the time talking about a single numerical variable such as the rate of return of mutual funds.
From this lecture, we shall start to study two variables with correlation.
BEA140 Leon Jiang, University of Tasmania 27
Two numerical variables A case: In a call center, operators were trained to receive phone
calls. However, the duration of calls shows a significant difference from one another. The shorter the duration of a call, the more efficient an operator proves to be.
Suppose, the call center manager wants to know if the training hours the operators received have any correlation to the duration of those phone calls the operators handled.
The data pooled down are as the follows: X Training hours Y Duration minutes
BEA140 Leon Jiang, University of Tasmania 28
Data pooled like this X (training hours): 6.5 7.5 6 8.5 5.5 3.5 8.5 8 8 7 8.5
9.5 Y (duration mins): 6.2 2.9 9.2 3.2 8.9 13.6 2.5 4.2 4.3 3.1 3.4
2.7 X (training hours): …………………………………………………….Y (duration mins): …………………………………………………….
Anyway, in total there have been 54 phone calls in this set of data being studied.
* Now, what we are about to find out is to know whether these two variables ( X training hours of operators ; Y duration minutes of calls) show any real correlation. Or , by putting it simply, the call center manager wants to know if the more training hours the operators receive, the shorter the duration of calls the operators handle will be.
BEA140 Leon Jiang, University of Tasmania 29
Setting up a scatter diagram for the data here ~! A scatter diagram ( scattergram ) between two variables will
indicate the form, type and strength of the relation.
Form – whether linear or non-linear
Type – direct (positive) or inverse (negative)
Strength – how closely data are co-ordinated, e.g. if linear, how close ordered pairs are to a line describing their relationship. This is indicated by a correlation measure.
BEA140 Leon Jiang, University of Tasmania 30
(Pearson’s) Coefficient of Correlation This is a summary measure that describes the form, type and
strength of a scattergram. The range of r is between –1 , 0 , 1. -1: perfect negative relationship – all points exactly on a negative
sloping line 0: no linear correlation 1: perfect positive relationship
nYnX
nYXXYr
YX /)(/)(
/))((2222
BEA140 Leon Jiang, University of Tasmania 31
Back to the case study
r( Pearson’s coefficient of correlation) = - 0.9209 This means X and Y have a very strong negative
linear relationship.
Or , let’s say the training hours the operators received really show a strong negative relationship with the duration of calls they handled.
BEA140 Leon Jiang, University of Tasmania 32
In-depth analysis of this linear relationship – linear regression
Determining the Coefficient of Correlation is concerned with summarizing the form, type and strength of the relationship between two variables.
The motivation for regression is the desire to quantify the relationship, often for the purposes of using the knowledge of one variable to predict the other.
Say , using one variable ( X ) to predict the other variable ( Y ).
BEA140 Leon Jiang, University of Tasmania 33
The regression line is mathematically expressed by this equation
Yc = a + bX Yc is the computed value of Y.
a is the sample regression constant, or Y-intercept.
b is the sample regression coefficient, or slope of the line.
BEA140 Leon Jiang, University of Tasmania 34
Least squares method
This is a mathematical technique that determines what values of a and b minimize the sum of squared differences. Any values for a and b other than those determined by the least-squares method result in a greater sum of squared differences between the actual value of Y and the predicted value of Y.
Simply put, least-squares method is used to find a line of best fit for two correlated variables.
BEA140 Leon Jiang, University of Tasmania 35
Working out the linear regression ~!
Residual is defined as the vertical distance between the actual value and the predicted value ( the point on the line of best fit).
In least-squares regression, we find the values of a and b, such that sum of squares of residuals, is a minimum.
Actual pairs : (X1, Y1), (X2, Y2),… ... Predicted (calculated )pairs: (X1, Yc1), (X1,Yc2), … …
BEA140 Leon Jiang, University of Tasmania 36
Back to the case study~! Since we have known that the training hours correlate to the
duration of calls. It is somehow to say : if we know the training hours an operator received , in some sense we can predict how many minutes , on average, he or she should take to handle a phone call.
Or, in linear regression, we know X and by using the least squares method, we can calculate out Y.
BEA140 Leon Jiang, University of Tasmania 37
Solutions for a & b
Two formulae respectively for a and b.
22 )( XXnYXXYn
b
nXbY
a
BEA140 Leon Jiang, University of Tasmania 38
Establishing a table to work out linear regression
Xi Yi Xi 2 XiYi Yi 2
6.5 6. 2 42. 25 40.3 38. 447. 5 2. 9 … … …6 9.2… …… …… …8.5 2.87.5 5.96 6. 5 36 39 42.25
391.5 290.7 2974.25 1863.55 2081.69
BEA140 Leon Jiang, University of Tasmania 39
Outcomes ~!
b=-1.79595 a=18.40399. Then Yc=18.404 –1.796X This is the linear regression.
Interpretation : for each extra hour of training, there is an associated decrease of 1.796 minutes in call duration.
BEA140 Leon Jiang, University of Tasmania 40
One consideration~! Note: regression says nothing about causation, only about
association~! This means X does not necessarily cause a change in Y. Or, the training hours do not necessarily change the duration of
calls, instead they have correlation.
Think about : does smoking cigarettes cause life expectancy shorter?
Not really~! ?
BEA140 Leon Jiang, University of Tasmania 41
The standard error of the estimate
Standard error measures how well actual Y and computed Y are matched – the smaller Se, the better the match and predictive accuracy.
2)( 2
2
nYcY
Se
2SeSe
BEA140 Leon Jiang, University of Tasmania 42
Note! Standard error is very similar to standard
deviation.
Standard error is for bivariate, whilst standard deviation is for univariate.
BEA140 Leon Jiang, University of Tasmania 43
Computational form for Se. You can use this computational form to find
out Se.
2
22
n
XYbYaYSe
BEA140 Leon Jiang, University of Tasmania 44
Coefficient of determination
Total variation = SST=
Explained variation = SSR
Unexplained variation = SSE=
Coefficient of determination =SSR / SST=
nYY /)( 22
XYbYaY 2
SSTSSE
1
BEA140 Leon Jiang, University of Tasmania 45
Coefficient of determination - The Coefficient of determination by calculation
turned out to be 0.848
This means 85% of total variation in call duration (around the average duration level) has been explained by a linear relation between duration and training hours.
2r
BEA140 Leon Jiang, University of Tasmania 46
We just saw summery measures for dealing with two numerical variables.
What about ordinal data?
BEA140 Leon Jiang, University of Tasmania 47
Two ordinal variables A scattergram can also be used to illustrate a
possible relationship between two ordinal variables.
We often have ordinal variables in fields such as Marketing and Management where people have been asked to rank some attribute.
An example could be a series of taste trials carried out during product development, such as the example below, where a panel was asked to rank soft drinks by “Refreshing ness” and “Sweetness”.
BEA140 Leon Jiang, University of Tasmania 48
Understanding this example This example illustrates which one of the
drinks is the most refreshing and which is the second most refreshing …
Likewise, which is the sweetest and which is the second sweetest …
BEA140 Leon Jiang, University of Tasmania 49
Drink Refresh Rank SweetnessRank
Slurp 1 8Fizz 2 7Fizz Plus 5 10Binge 6 9Slam 3 5Dunk 4 6Whizz 10 2Pling 9 3Tweak 7 1Blitz 8 4
BEA140 Leon Jiang, University of Tasmania 50
Sweetness vs Refreshingness
0
2
4
6
8
10
12
0 2 4 6 8 10 12
Refreshing Rank
Swee
tnes
s R
ank
BEA140 Leon Jiang, University of Tasmania 51
Spearman’s Rank Correlation Coefficient Spearman’s Rank CC, can be used as a summary measure
to gauge the degree of relationship between two ordinal variables.
Spearman’s Rank C.C. is given the symbol rs for sample
data, (and s for population data) It is usually calculated using the following short cut formula:
is the Greek letter ‘rho’ - (the Greek equivalent to ‘r’).
BEA140 Leon Jiang, University of Tasmania 52
Where di is the difference between the ranks of the ith pair of observations, and n is the number of pairs of observations.
1nn
d61r 2
n
1i
2i
s
BEA140 Leon Jiang, University of Tasmania 53
Notes to this short formula Strictly speaking this formula only works
when the number of ties is relatively small. If more than about 1/4 to 1/3 of the observations of a variable are in ties then the shortcut formula starts to get unreliable. We will deal with ties later. When there are too many ties we need to use the “long” formula
BEA140 Leon Jiang, University of Tasmania 54
What are ties?
BEA140 Leon Jiang, University of Tasmania 55
Dealing with ties: we allocate the average rank of all observations involved in the tie, to each observation involved in the tie.
Standard & Poor’s bond ratings for a random sample of 12 bonds:
C BB A AA A BBB CC D B A AA AAA
BEA140 Leon Jiang, University of Tasmania 56
C BB A AA A BBB CC D B A AA AAA
AAA AA AA A A A BBB BB B CC C D
1 2 3 4 5 6 7 8 9 10 11 12
AAA AA AA A A A BBB BB B CC C D
1 2.5 2.5 5 5 5 7 8 9 10 11 12
BEA140 Leon Jiang, University of Tasmania 57
Two people came equal third (that is, the next person came fifth). These share the 3rd & 4th positions and thus each is given a rank of 3.5.
placing 1.0 2.0 3* 3* 5.0 6.0 7.0ranking 1.0 2.0 3.5 3.5 5.0 6.0 7.0
BEA140 Leon Jiang, University of Tasmania 58
Rankings with ties When rankings involve ties they provide us
with two extra problems: how to deal with the ties the short cut formula may be unreliable
if there are too many ties, and we need to use a longer formula –
BEA140 Leon Jiang, University of Tasmania 59
n
i
n
iii
n
i
n
iii
n
i
n
i
n
iiiii
s
YYnXXn
YXYXnr
1 1
22
1 1
22
1 1 1
)()(
))((
The Full Spearman formula- Use when there are ties!
BEA140 Leon Jiang, University of Tasmania 60
Example - using the “short cut” formula Drink Refresh Rank Sweetness
Rankdi di
2
Slurp 1 8 -7 49Fizz 2 7 -5 25Fizz Plus 5 10 -5 25Binge 6 9 -3 9Slam 3 5 -2 4Dunk 4 6 -2 4Whizz 10 2 8 64Pling 9 3 6 36Tweak 7 1 6 36Blitz 8 4 4 16
268
BEA140 Leon Jiang, University of Tasmania 61
Result ! rs = 1 - 6*268 / (10*99) = - 0.624
Indicating quite a strong negative relationship between refreshingness and sweetness, (as we saw in the scattergram).
BEA140 Leon Jiang, University of Tasmania 62
Example – using the “long” formula
A students association’s satisfaction ratings for 8 courses, and the seniority of the person taking the course are listed below. Use Spearman’s Rank C.C. to investigate the relationship between the two.
BEA140 Leon Jiang, University of Tasmania 63
End of Module 2We are getting in Module 3 !
BEA140 Leon Jiang, University of Tasmania 64
Module 3Probability & Probability Distributions
BEA140 Leon Jiang, University of Tasmania 65
Probability What is meant by the word – probability?
Probability is the likelihood or chance that a particular event will occur.
Three approaches to probability
1. A priori classical probability2. Empirical classical probability
3. Subjective probability
BEA140 Leon Jiang, University of Tasmania 66
A priori classical probability
The probability of success is based on prior knowledge of the process involved.
Probability of occurrence
Probability of occurrence
X=number of ways in which the event occurs T=total number of elementary outcomes
TX
BEA140 Leon Jiang, University of Tasmania 67
Example for priori classical probability A box of 20 chocolate beans, among which 10
are red-colored and the other 10 are green-colored.
The probability of selecting a piece of red-colored bean each time is 0.5 , or say : 10 / 20.
Because we know the total number of beans and also the proportion of the two different colored beans in advance, that’s why we call it – “ priori probability ”
BEA140 Leon Jiang, University of Tasmania 68
Empirical classical probability
Empirical classical probability adopts the same formula to calculate the probability of occurrence.
Probability of occurrence
However, in empirical classical probability, probability of success is based on observed data instead of pre-known data (priori).
TX
BEA140 Leon Jiang, University of Tasmania 69
Example for empirical classical probability Your mid-term exam is coming and this exam is said to be
optional, which means you can choose to take the exam or not.
If we take a poll asking how many students are to attend the exam and 99% of students are to attend the exam, we say here, there is a 0.99 probability that an individual student will attend the exam.
Remember, in this example, we did not know how many students wished to take the exam. And this is different from the priori classical example, in which we already knew 50% were red and 50% were green.
So, empirical probability actually is based on more randomness.
BEA140 Leon Jiang, University of Tasmania 70
Subjective probability
From the name we can infer that this approach to probability is based on people’s personal decision.
For instance: You think you have a probability of 90% to pass
CPA exam and your supervisor thinks your probability to pass it can be 60%.
Both of the probabilities are based on personal judgment and experience, but not on objectiveness.
BEA140 Leon Jiang, University of Tasmania 71
Sample spaces and events Event : Each possible type of occurrence is referred to
as an event.
Simple event A simple event can be described by a single
characteristic.
Sample space: The collection of all the possible events is called
the sample space.
BEA140 Leon Jiang, University of Tasmania 72
Axioms about probability
Given a sample space: S={E1+E2+… + En}, the probabilities assigned to Ei must satisfy:
0 Ei 1, for each I≦ ≦ If an event has no chance to occur, the probability is 0 and if
an event is definite to occur, the probability is 1.
P(E1) + P(E2) +…P(En) = ∑P(Ei) = 1 Probability of Event A = sum of probabilities of simple events
comprising A.
BEA140 Leon Jiang, University of Tasmania 73
Contingency tables
By example:
Intent to purchase investigation This kind of investigation often takes place in sal
es and marketing research scenario.
In this example : the sample space is 1,000 households in terms of purchase behavior for laptop computer.
BEA140 Leon Jiang, University of Tasmania 74
In the investigation, there are basically two intents to the purchase.
Sub-samples1. Planned to purchase – 300 households2. Not planned to purchase – 700
households
So, after the purchase behaviors happened, we can further subdivide the sample of 1,000 households into :
1. Actually purchased2. Not purchased
BEA140 Leon Jiang, University of Tasmania 75
Now, in this example, of the big sample of 1,000, we can have four different sub-samples:
1. Planned to purchase 2. Not planned to purchase
3. Purchased 4. Not purchased
BEA140 Leon Jiang, University of Tasmania 76
but, latter, the outcomes of actual purchase and no purchase turned out to be not that consistent with the original investigated intents.
In the first category ( planned to purchase – 300 households), 200 out of 300 actually purchased and the remaining 100 did not.
In the second category ( not planned to purchase – 700 households ), 50 out of 700 actually purchased, the remaining 650 was consistent with their initial intent.
BEA140 Leon Jiang, University of Tasmania 77
Complement and joint event
The complement of event A includes all events that are not part of event A. The complement of A is given by the symbol A’ or .
In the above example, 300 planned to purchase is the complement of 700 not planned to purchase.
Joint event: A joint event is an event that has two or more
characteristics. In the above example, the event “ planned to purchase and
actually purchased” is a joint event.
A
BEA140 Leon Jiang, University of Tasmania 78
Usually two ways to depict events in sample
Contingency table - also called “ table of cross-classification ”
Now, based on the above example, we learn to construct this contingency table and Venn diagram.
BEA140 Leon Jiang, University of Tasmania 79
Contingency table
Yes No Total200 50 250100 650 750300 700 1000
Planned to purchase Actually Purchased
YesNo
Total
BEA140 Leon Jiang, University of Tasmania 80
Terms
Intersection A∩B: both A and B occur together, the joint
event. ( sometimes simply written as AB)
Union A B: either A or B or both.∪ Other common forms of notation include A B , ∨
A+B, A OR B
BEA140 Leon Jiang, University of Tasmania 81
Example of using the above two notations
Number (n) of cards that is a Heart or an ace in a set of poker cards (52 cards).
n(H A) = n(H) + n(A) - n(H∩A)∪ = 13 + 4 - 1 = 16
BEA140 Leon Jiang, University of Tasmania 82
Complement Complement - A’: event A does not occur, or another form :
NOT A. Example: Non-hearts = H’, n(H’) = 39 Complement rule: P(A) = 1- P(A’)
BEA140 Leon Jiang, University of Tasmania 83
Mutually exclusive and collectively exhaustive Mutually exclusive: occurrence of one event
precludes occurrence of another. If A and B are mutually exclusive, then n(A∩B) = 0.
Collectively exhaustive: Events together comprise the sample space; at least
one event is certain to occur. Example: number of female students number of ∪
male students = 26 ( QM course ).
BEA140 Leon Jiang, University of Tasmania 84
More to understand mutually exclusive and collectively exhaustive
For being female or male, everyone only can be one or the other ( collectively exhaustive) , but no one is both ( mutually exclusive).
Being female or male are mutually exclusive and collectively exhaustive events.
In the example of TV set purchase: Planned to purchase or not planned to purchase. Everyone
only can plan to purchase or not (collectively exhaustive), but no one is both “planned to purchase” and “not planned to purchase ” (mutually exclusive).
BEA140 Leon Jiang, University of Tasmania 85
Probability contingency table
numbers M M' TotalO 7 14 21O' 24 35 59
Total 31 49 80
numbers M M' TotalO 0. 0875 0. 175 0. 2625O' 0. 3 0. 4375 0. 7375
Total 0. 3875 0. 6125 1
BEA140 Leon Jiang, University of Tasmania 86
General form of a 2×2 contingency table
Probabilities A A' TotalB P(A∩ B) P(A'∩ B) P(B)B' P(A∩ B') P(A'∩ B') P(B')
Total P(A) P(A') 1
BEA140 Leon Jiang, University of Tasmania 87
Simple (marginal) probability : P(A)
The most fundamental rule for probabilities is that they range from 0 to 1.
Simple (marginal) probability refers to the probability of occurrence of a simple event. P(A).
Example: what is the probability that a red-heart card is selected in a set of poker cards?
P(red-card) = 13 / 52 = 0.25
BEA140 Leon Jiang, University of Tasmania 88
Joint probability : P(A∩B)
Joint probability refers to situations involving two or more events, such as the probability of planned to purchase and actually purchased in the big-screen TV set purchase example.
Joint probability means that both event A and B must occur simultaneously.
So, P(planned ∩purchased ) = 200/1000 = 0.2
BEA140 Leon Jiang, University of Tasmania 89
Computing marginal probability
In fact, the marginal probability of an event consists of a set of joint probabilities.
The formula: P(A) = P(A and B1) + P(A and B2) + … + P(A and Bk)
In the previous example: P(planned to purchase) = P(planned to purchase and
purchased) + P(planned to purchase + did not purchase) = 200/1000 + 100/1000 =0.30
BEA140 Leon Jiang, University of Tasmania 90
Addition rule
P(A B) = P(A) + P(B) – P(A∩B)∪ N.B. If A, B are mutually exclusive, then P(A∩B) = 0, and
P(A B) = P(A) + P(B) ∪
BEA140 Leon Jiang, University of Tasmania 91
Multiplication rule
P(A∩B) = P(B|A) = P(A|B)P(B) and it follows that P(B|A) = P(A∩B) / P(A) or
P(A|B) = P(A∩B) / P(B)
The bar symbol “ | ” means “given”.
P(B|A) is the probability of B happening given that A happens. This is known as a conditional probability.
BEA140 Leon Jiang, University of Tasmania 92
Conditional probability
To spot conditional probabilities, we notice those words like “of ”, “ if ” and “given”. Suppose :
D = “part is defective”, and B = “part was produced by B”, the following would tell you P(D|B):
- If a part was produced by B, there is a 5% chance it is defective.
- 5% of the parts produced by B are defective.- There is a 5% chance that a part is defective, given that it
was produced by B.
BEA140 Leon Jiang, University of Tasmania 93
Back to the TV purchase example
P(actually purchased | planned to purchase) = planned to purchase and actually purchased planned to purchase= 200 / 250 = 0.80 P(B|A) = P(A and B) / P(A)
Here: A = planned to purchase B = Actually purchased
BEA140 Leon Jiang, University of Tasmania 94
Independence
Two events, A and B, are independent, if the probability of A occurring is not affected by B and vice versa.
A, B independent if :
P(A) = P(A|B) , P(B) = P(B|A)
P(AB) = P(A)P(B) only if A and B are independent.
BEA140 Leon Jiang, University of Tasmania 95
Bayes’ Theorem Bayes’ rule is useful in decision analysis. Let’s learn it through an example as follows:
A machine is known to be in good condition 90% of the time. If in good condition, only 1% of output is defective. If in poor condition, 10% of output is defective.
An item of output is observed to be defective. Given this information what is the probability that the machine is in good condition?
BEA140 Leon Jiang, University of Tasmania 96
Solution
G: condition of machine is good. D: an item of output is defective.
Probabilities: Prior (pre-condition) : P(G) = 0.9, P(G’) = 0.1 Conditional : P(D|G) = 0.01, P(D|G’) = 0.10
P(G|D) = P(D|G)*P(G) / P(D) - conditional probability
But, we need to find out P(D).
BEA140 Leon Jiang, University of Tasmania 97
P(defect) = P(defect and good condition) + P(defect and poor condition)
P(G∩D) + P(G’ ∩D) P(D) = P(D|G)P(G) + P(D|G’)P(G’) = 0.01*0.9 + 0.10*0.1 = 0.019
Then : P(G|D) = 0.009 / 0.019 = 0.47
BEA140 Leon Jiang, University of Tasmania 98
Expression of Bayes’ Rule
P(A|B) = P(B|A)P(A) / P(B)
This actually is the formula for joint probability.
BEA140 Leon Jiang, University of Tasmania 99
Counting Rule 1
If any one of k different mutually exclusive and collectively exhaustive events can occur on each of n trials, the number of possible outcomes is equal to
nk
BEA140 Leon Jiang, University of Tasmania 100
Example for counting rule 1
A coin ( two sides) tossed 10 times, the number of outcomes is
024,1210
BEA140 Leon Jiang, University of Tasmania 101
Counting Rule 2
If there are K1 events on the first trial, K2 events on the second trial, … and Kn events on the nth trial, then the number of possible outcomes is
(k1) (k2) … (Kn)
BEA140 Leon Jiang, University of Tasmania 102
Example for counting rule 2
A license plate consists of 3 letters (26 letters in total, a,b,c…z) followed by 3 digits ( 1 – 10), the possible outcomes are:
26× 26×26 ×10 ×10 ×10 = 17,576,000
BEA140 Leon Jiang, University of Tasmania 103
Counting Rule 3
The number of ways that n objects can be arranged in order is:
n!=(n)(n-1)…(1) 0!=1
“!” reads “factorial”. “n!” is read “n factorial”.
BEA140 Leon Jiang, University of Tasmania 104
Example for counting rule 3
The number of ways that 6 books can be arranged is:
n!=6!= 6*5*4*3*2*1=720
BEA140 Leon Jiang, University of Tasmania 105
Counting Rule 4
Permutations: the number of ways of arranging X objects selected from n objects in order is:
Permutation: each possible arrangement is called permutation.
)!(!Xnn
BEA140 Leon Jiang, University of Tasmania 106
Example for counting rule 4
The number of ordered arrangements of 4 books selected from 6 books is :
360)!46(
!6)!(
!
Xnn
BEA140 Leon Jiang, University of Tasmania 107
Counting Rule 5
Combinations: the number of ways of selecting X objects out of n objects, irrespective of order, is :
)!(!!XnX
nn
X
BEA140 Leon Jiang, University of Tasmania 108
Example for counting rule 5 – also called rule of combinations
4 books out of 6 books, the number of arrangements is ( note: irrelevant to order):
= 15
)!(!!XnX
nn
X