midterm 1 well done !! mean 80.23% median 84.6% standard deviation of 16.24 ppt. 5 th percentile is...
TRANSCRIPT
Midterm 1
• Well done !!• Mean 80.23%• Median 84.6%• Standard deviation of 16.24 ppt.• 5th percentile is 53.
Statistics for Socialand Behavioral Sciences
Session #9:Probabilities
(Agresti and Finlay, Chapter 9)
Prof. Amine Ouazad
Statistics Course Outline
PART I. INTRODUCTION AND RESEARCH DESIGN
PART II. DESCRIBING DATA
PART III. DRAWING CONCLUSIONS FROM DATA: INFERENTIAL
STATISTICS
PART IV. : CORRELATION AND CAUSATION: REGRESSION
ANALYSIS
Week 1
Weeks 2-4
Weeks 5-9
Weeks 10-14
This is where we talk about Zmapp and Ebola!
Firenze or Lebanese Express?
Wrap-up on First Part
1. We are curious…. we ask empirical questions.2. We design the study:– Data collection, e.g. by simple random sampling.– Nonresponse bias, response bias, sampling bias.
3. We describe the data:– Using statistics:
• Univariate: mean, standard deviation, variance.• Bivariate: correlation, slope, R squared, TSS, ESS, SSE.
– We measure statistics but are interested in parameters… Statistics suffer from sampling error.
4. How can we make inferences??e.g. concluding that the coin is balanced.
➥ This is the focus of the next 3-4 weeks.
Outline
1. ProbabilitiesThe Three Rules
2. Random VariableExpectation of a random variable
Next time: Probability Distributions (continued) Chapter 4 of A&F
Probability and Luck
• We play a game together… – Heads you win 10 dirham.– Tails I win 10 dirham.
• We play the game a very largenumber of times.
• Should you play this game?• P(heads) = 0.5, P(tails) = 0.5
• P(heads) = 1 – P(not heads)• P(heads) is read as “probability of heads”.• Game sequence:
– In the long run, with a balanced coin, 0.5 of the trials will lead to heads, 0.5 of the trials will lead to tails.
– The probability of heads is the ratio of the number of heads to the number of trials, with an infinite number of draws…
Probability and Luck
Perform the game for a very long number of draws.
… the longer the game the closer the ratio will be to 0.5
Sometimes we can’t repeat our choices
Life is full of random events… but• We only draw one job at the end of university.– Hard to know what other incomes/jobs we would
have gotten.• We only draw one opponent in a football
game.– Subsequent games are not identical to this one.– What is the probability of winning?
• We only die once at a particular age.– What is the probability of death at age 50?
• In such a case we define the probability of an event as the ratio of the number of such events over the number of individuals in identical circumstances.– … for a very large number of such individuals.
• Example: number of individuals with the same degree, same age as me:
• What is the probability of earning more than $45,000 in my first job?
Sometimes we can’t repeat our choices
Sometimes we can’t repeat our choices?
Groundhog dayA weatherman finds himself living the same day over and over again.« a blizzard develops that Connors had predicted would miss them, closing the roads and shutting down long-distance phone service, forcing the team to return to Punxsutawney. Connors awakens the next morning, however, to find it is again February 2, and his day unfolds in exactly the same way. He is aware of the repetition, but everyone else seems to be living February 2 exactly the same way and for the first time. »
Edge of TomorrowLt. Col. Bill Cage is an officer who has never seen a day of combat when he is unceremoniously dropped into what amounts to a suicide mission. Killed within minutes, Cage now finds himself inexplicably thrown into a time loop - forcing him to live out the same brutal combat over and over, fighting and dying again - and again.
Are these really independent events??
• What is the probability that you win twice in a row?– P(heads in the first round)
* P(heads in the second round) = – Because the draws in the first and the second round
are independent events.• What is the probability that you win k times in a
row?– P(heads in the first round)
* P(heads in the second round)* …. * P(heads in the kth round) =
Probability and Luck
Rules for probability distributionsIn general, we talk about the probability of an event.
– What is the probability that « it rains tomorrow »?
For an event A…1. P(not A) = 1 – P(A).
If A and B are distinct possible events (with no overlap), then2. P(A or B) = P(A) + P(B).
If A and B are two (possibly related) events,3. P(A and B) = P(A) x P(B given that A has occured).
Special case: If A and B are independent, i.e. P(B given A) = P(B),3’. P(A and B) = P(A) x P(B)
P(A and B) = 0
Applications: Coins
1. P(getting tails) = P(not getting heads) = 1 – P(getting heads)
2. P(tails) + P(heads) = P(tails or heads) = 1
3. P(tails in 1st throw and tails in 2nd throw)= P(tails in 1st throw) x P(tails in 2nd throw given tails in 1st throw).
with independence
P(tails in 1st throw and tails in 2nd throw) = P(tails in 2nd throw) x P(tails in 2nd throw)
Applications: Dice
1. P(throwing 4) = P(not throwing 4) = 1 – P(throwing 4)
2. P(throwing 4) + P(throwing 7) = P(throwing 4 or 7) = 2/6
3. P(throwing 4 in 1st throw and throwing 7 in 2nd throw)= P(throwing 4 in 1st throw) x P(throwing 7 in 2nd throw given throwing 4 in 1st throw).
with independence
P(throwing 4 in 1st throw and throwing 7 in 2nd throw ) = P(throwing 4 in 1st throw ) x P(throwing 7 in 2nd throw)
Inverse Probability Fallacy• When asked about the probability of the disease
given the symptom P(disease | symptom) clinicians tend to answer with the probability of the symptom given disease P(symptom | disease).
There is an equal number of blue and green cabs in the capital of Happinistan. The color of the cab is independent of the probability of having an accident.• What is the probability that a taxi has been
involved in an accident given that it is green?
Gloms and Fizos
Outline
1. ProbabilitiesThe Three Rules
2. Random VariableExpectation of a random variable
Next time: Probability Distributions (continued) Chapter 4 of A&F
Random variableA random variable is a variable whose value is not given ex-ante… but rather can take multiple values ex-post.• Example: – X is a random variable that, before the coin is tossed (ex-ante),
can take values « Heads » or « Tails ». Once the coin is tossed (ex-post), the value of X is known, it is either « Heads » or « Tails ».
– Y is a random variable that can take values 1,2,3,4,5, or 6 depending on the draw of a dice. Before the dice is thrown, the value is not known. After the dice is drawn, we know the value of Y.
Probability distributionof a random variable
• Take all possible values of a random variable Y:– Example: 1,2,3,4,5,6– In general: y1, y2, y3, …, yK.
• Probability of the event that the random variable Y equates yk is noted P(Y=yk) or simply P(yk).
• The probability distribution of random variable Y is the list of all values of P(Y=yk).
• Example: for a balanced dice, theprobability distribution of Y is thelist of values P(Y=1), P(Y=2), P(Y=3), …which is {1/6,1/6,1/6,1/6,1/6,1/6}
All throughout the course we consider either discrete quantitative random variables or categorical random variables.
Expected value of a random variable
What are your expected gains when playing the coin game?• Gain is a random variable, equal to +10 AED when getting
heads, and -10 AED when getting tails.E(gain) = Gain when getting heads x Probability of heads
+ Gain when getting tails x Probability of tails.In general, for a random variable Y, the expected value of Y is:• E(Y) = S yk P(Y=yk)
Also note that probabilities sum to one.S P(Y=yk) = 1
Expected Earnings?• « Your annual earnings right after NYU Abu
Dhabi » is a random variable…– The variable has not been realized yet.
Let’s give it a nameY = « Your annual earnings right after NYU Abu Dhabi ».• E(earnings) = E(Y) = S yk P(Y=yk)
Takes potentially K values.• Problemo: We don’t observe earnings in the
future!!!
An approximation is to use the distribution of current graduates …To substitute for our lack of knowledgeof P(Y=yk) for each k.• Earnings take K distinct values, no two graduates earn
exactly the same annual wage…• Hence an approximation of expected earnings is
E(Y) = S yk x (1/ K)• The average earnings of current graduates…• But that’s only an approximation !! What could be
wrong?
Expected Earnings?
Properties of the Expectation
The expectation of the sum is the sum of the expectations:• E(earnings – debt) = E(earnings) – E(debt)The expectation of a constant x the random variable is the constant x the expectation:• E( Constant x Earnings ) = Constant x E(Earnings)E.g. E(Earnings in AED) = 3.6 x E(Earnings in USD)Beware !!!• E( X Y ) is not E(X) E(Y) in general.• When X and Y are independent, E( X Y ) = E(X) E(Y).• Law of conditional expectation E(X)=E(E(X|Z))
Wrap Up• Four rules of probability distributions
1. P(not A) = 1 – P(A)2. P(A or B) = P(A) + P(B) when P(A and B)=03. P(A and B)=P(A) P(B given A)
Beware of the inverse probability fallacy, P(B given A) is not P(A given B)
3’. P(A and B)=P(A) P(B) when A and B are independent• Random variable
– Variable whose value has not been realized.• Probability distribution of a random variable
– List of the probabilities of the values of the random variable.
• Expected value of a random variable E(Y)=S yk P(Y=yk)– E(X+Y)=E(X)+E(Y), E(cX)= c E(X), E(X) = E(X|Z)
Coming up: Readings:• Chapter 4 entirely – full of interesting examples and super relevant.• Online quiz on Thursday night.• No slide due on Thursday.
For help:
• Amine OuazadOffice 1135, Social Science [email protected] hour: Tuesday from 5 to 6.30pm.
• GAF: Irene [email protected] recitations. At the Academic Resource Center, Monday from 2 to 4pm.