midterm 1 well done !! mean 80.23% median 84.6% standard deviation of 16.24 ppt. 5 th percentile is...

25
Midterm 1 • Well done !! • Mean 80.23% • Median 84.6% • Standard deviation of 16.24 ppt. •5 th percentile is 53.

Upload: silas-blair

Post on 23-Dec-2015

228 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Midterm 1 Well done !! Mean 80.23% Median 84.6% Standard deviation of 16.24 ppt. 5 th percentile is 53

Midterm 1

• Well done !!• Mean 80.23%• Median 84.6%• Standard deviation of 16.24 ppt.• 5th percentile is 53.

Page 2: Midterm 1 Well done !! Mean 80.23% Median 84.6% Standard deviation of 16.24 ppt. 5 th percentile is 53

Statistics for Socialand Behavioral Sciences

Session #9:Probabilities

(Agresti and Finlay, Chapter 9)

Prof. Amine Ouazad

Page 3: Midterm 1 Well done !! Mean 80.23% Median 84.6% Standard deviation of 16.24 ppt. 5 th percentile is 53

Statistics Course Outline

PART I. INTRODUCTION AND RESEARCH DESIGN

PART II. DESCRIBING DATA

PART III. DRAWING CONCLUSIONS FROM DATA: INFERENTIAL

STATISTICS

PART IV. : CORRELATION AND CAUSATION: REGRESSION

ANALYSIS

Week 1

Weeks 2-4

Weeks 5-9

Weeks 10-14

This is where we talk about Zmapp and Ebola!

Firenze or Lebanese Express?

Page 4: Midterm 1 Well done !! Mean 80.23% Median 84.6% Standard deviation of 16.24 ppt. 5 th percentile is 53

Wrap-up on First Part

1. We are curious…. we ask empirical questions.2. We design the study:– Data collection, e.g. by simple random sampling.– Nonresponse bias, response bias, sampling bias.

3. We describe the data:– Using statistics:

• Univariate: mean, standard deviation, variance.• Bivariate: correlation, slope, R squared, TSS, ESS, SSE.

– We measure statistics but are interested in parameters… Statistics suffer from sampling error.

4. How can we make inferences??e.g. concluding that the coin is balanced.

➥ This is the focus of the next 3-4 weeks.

Page 5: Midterm 1 Well done !! Mean 80.23% Median 84.6% Standard deviation of 16.24 ppt. 5 th percentile is 53

Outline

1. ProbabilitiesThe Three Rules

2. Random VariableExpectation of a random variable

Next time: Probability Distributions (continued) Chapter 4 of A&F

Page 6: Midterm 1 Well done !! Mean 80.23% Median 84.6% Standard deviation of 16.24 ppt. 5 th percentile is 53

Probability and Luck

• We play a game together… – Heads you win 10 dirham.– Tails I win 10 dirham.

• We play the game a very largenumber of times.

• Should you play this game?• P(heads) = 0.5, P(tails) = 0.5

Page 7: Midterm 1 Well done !! Mean 80.23% Median 84.6% Standard deviation of 16.24 ppt. 5 th percentile is 53

• P(heads) = 1 – P(not heads)• P(heads) is read as “probability of heads”.• Game sequence:

– In the long run, with a balanced coin, 0.5 of the trials will lead to heads, 0.5 of the trials will lead to tails.

– The probability of heads is the ratio of the number of heads to the number of trials, with an infinite number of draws…

Probability and Luck

Perform the game for a very long number of draws.

… the longer the game the closer the ratio will be to 0.5

Page 8: Midterm 1 Well done !! Mean 80.23% Median 84.6% Standard deviation of 16.24 ppt. 5 th percentile is 53

Sometimes we can’t repeat our choices

Life is full of random events… but• We only draw one job at the end of university.– Hard to know what other incomes/jobs we would

have gotten.• We only draw one opponent in a football

game.– Subsequent games are not identical to this one.– What is the probability of winning?

• We only die once at a particular age.– What is the probability of death at age 50?

Page 9: Midterm 1 Well done !! Mean 80.23% Median 84.6% Standard deviation of 16.24 ppt. 5 th percentile is 53

• In such a case we define the probability of an event as the ratio of the number of such events over the number of individuals in identical circumstances.– … for a very large number of such individuals.

• Example: number of individuals with the same degree, same age as me:

• What is the probability of earning more than $45,000 in my first job?

Sometimes we can’t repeat our choices

Page 10: Midterm 1 Well done !! Mean 80.23% Median 84.6% Standard deviation of 16.24 ppt. 5 th percentile is 53

Sometimes we can’t repeat our choices?

Groundhog dayA weatherman finds himself living the same day over and over again.« a blizzard develops that Connors had predicted would miss them, closing the roads and shutting down long-distance phone service, forcing the team to return to Punxsutawney. Connors awakens the next morning, however, to find it is again February 2, and his day unfolds in exactly the same way. He is aware of the repetition, but everyone else seems to be living February 2 exactly the same way and for the first time. »

Edge of TomorrowLt. Col. Bill Cage is an officer who has never seen a day of combat when he is unceremoniously dropped into what amounts to a suicide mission. Killed within minutes, Cage now finds himself inexplicably thrown into a time loop - forcing him to live out the same brutal combat over and over, fighting and dying again - and again.

Are these really independent events??

Page 11: Midterm 1 Well done !! Mean 80.23% Median 84.6% Standard deviation of 16.24 ppt. 5 th percentile is 53

• What is the probability that you win twice in a row?– P(heads in the first round)

* P(heads in the second round) = – Because the draws in the first and the second round

are independent events.• What is the probability that you win k times in a

row?– P(heads in the first round)

* P(heads in the second round)* …. * P(heads in the kth round) =

Probability and Luck

Page 12: Midterm 1 Well done !! Mean 80.23% Median 84.6% Standard deviation of 16.24 ppt. 5 th percentile is 53

Rules for probability distributionsIn general, we talk about the probability of an event.

– What is the probability that « it rains tomorrow »?

For an event A…1. P(not A) = 1 – P(A).

If A and B are distinct possible events (with no overlap), then2. P(A or B) = P(A) + P(B).

If A and B are two (possibly related) events,3. P(A and B) = P(A) x P(B given that A has occured).

Special case: If A and B are independent, i.e. P(B given A) = P(B),3’. P(A and B) = P(A) x P(B)

P(A and B) = 0

Page 13: Midterm 1 Well done !! Mean 80.23% Median 84.6% Standard deviation of 16.24 ppt. 5 th percentile is 53

Applications: Coins

1. P(getting tails) = P(not getting heads) = 1 – P(getting heads)

2. P(tails) + P(heads) = P(tails or heads) = 1

3. P(tails in 1st throw and tails in 2nd throw)= P(tails in 1st throw) x P(tails in 2nd throw given tails in 1st throw).

with independence

P(tails in 1st throw and tails in 2nd throw) = P(tails in 2nd throw) x P(tails in 2nd throw)

Page 14: Midterm 1 Well done !! Mean 80.23% Median 84.6% Standard deviation of 16.24 ppt. 5 th percentile is 53

Applications: Dice

1. P(throwing 4) = P(not throwing 4) = 1 – P(throwing 4)

2. P(throwing 4) + P(throwing 7) = P(throwing 4 or 7) = 2/6

3. P(throwing 4 in 1st throw and throwing 7 in 2nd throw)= P(throwing 4 in 1st throw) x P(throwing 7 in 2nd throw given throwing 4 in 1st throw).

with independence

P(throwing 4 in 1st throw and throwing 7 in 2nd throw ) = P(throwing 4 in 1st throw ) x P(throwing 7 in 2nd throw)

Page 15: Midterm 1 Well done !! Mean 80.23% Median 84.6% Standard deviation of 16.24 ppt. 5 th percentile is 53

Inverse Probability Fallacy• When asked about the probability of the disease

given the symptom P(disease | symptom) clinicians tend to answer with the probability of the symptom given disease P(symptom | disease).

There is an equal number of blue and green cabs in the capital of Happinistan. The color of the cab is independent of the probability of having an accident.• What is the probability that a taxi has been

involved in an accident given that it is green?

Page 16: Midterm 1 Well done !! Mean 80.23% Median 84.6% Standard deviation of 16.24 ppt. 5 th percentile is 53

Gloms and Fizos

Page 17: Midterm 1 Well done !! Mean 80.23% Median 84.6% Standard deviation of 16.24 ppt. 5 th percentile is 53

Outline

1. ProbabilitiesThe Three Rules

2. Random VariableExpectation of a random variable

Next time: Probability Distributions (continued) Chapter 4 of A&F

Page 18: Midterm 1 Well done !! Mean 80.23% Median 84.6% Standard deviation of 16.24 ppt. 5 th percentile is 53

Random variableA random variable is a variable whose value is not given ex-ante… but rather can take multiple values ex-post.• Example: – X is a random variable that, before the coin is tossed (ex-ante),

can take values « Heads » or « Tails ». Once the coin is tossed (ex-post), the value of X is known, it is either « Heads » or « Tails ».

– Y is a random variable that can take values 1,2,3,4,5, or 6 depending on the draw of a dice. Before the dice is thrown, the value is not known. After the dice is drawn, we know the value of Y.

Page 19: Midterm 1 Well done !! Mean 80.23% Median 84.6% Standard deviation of 16.24 ppt. 5 th percentile is 53

Probability distributionof a random variable

• Take all possible values of a random variable Y:– Example: 1,2,3,4,5,6– In general: y1, y2, y3, …, yK.

• Probability of the event that the random variable Y equates yk is noted P(Y=yk) or simply P(yk).

• The probability distribution of random variable Y is the list of all values of P(Y=yk).

• Example: for a balanced dice, theprobability distribution of Y is thelist of values P(Y=1), P(Y=2), P(Y=3), …which is {1/6,1/6,1/6,1/6,1/6,1/6}

All throughout the course we consider either discrete quantitative random variables or categorical random variables.

Page 20: Midterm 1 Well done !! Mean 80.23% Median 84.6% Standard deviation of 16.24 ppt. 5 th percentile is 53

Expected value of a random variable

What are your expected gains when playing the coin game?• Gain is a random variable, equal to +10 AED when getting

heads, and -10 AED when getting tails.E(gain) = Gain when getting heads x Probability of heads

+ Gain when getting tails x Probability of tails.In general, for a random variable Y, the expected value of Y is:• E(Y) = S yk P(Y=yk)

Also note that probabilities sum to one.S P(Y=yk) = 1

Page 21: Midterm 1 Well done !! Mean 80.23% Median 84.6% Standard deviation of 16.24 ppt. 5 th percentile is 53

Expected Earnings?• « Your annual earnings right after NYU Abu

Dhabi » is a random variable…– The variable has not been realized yet.

Let’s give it a nameY = « Your annual earnings right after NYU Abu Dhabi ».• E(earnings) = E(Y) = S yk P(Y=yk)

Takes potentially K values.• Problemo: We don’t observe earnings in the

future!!!

Page 22: Midterm 1 Well done !! Mean 80.23% Median 84.6% Standard deviation of 16.24 ppt. 5 th percentile is 53

An approximation is to use the distribution of current graduates …To substitute for our lack of knowledgeof P(Y=yk) for each k.• Earnings take K distinct values, no two graduates earn

exactly the same annual wage…• Hence an approximation of expected earnings is

E(Y) = S yk x (1/ K)• The average earnings of current graduates…• But that’s only an approximation !! What could be

wrong?

Expected Earnings?

Page 23: Midterm 1 Well done !! Mean 80.23% Median 84.6% Standard deviation of 16.24 ppt. 5 th percentile is 53

Properties of the Expectation

The expectation of the sum is the sum of the expectations:• E(earnings – debt) = E(earnings) – E(debt)The expectation of a constant x the random variable is the constant x the expectation:• E( Constant x Earnings ) = Constant x E(Earnings)E.g. E(Earnings in AED) = 3.6 x E(Earnings in USD)Beware !!!• E( X Y ) is not E(X) E(Y) in general.• When X and Y are independent, E( X Y ) = E(X) E(Y).• Law of conditional expectation E(X)=E(E(X|Z))

Page 24: Midterm 1 Well done !! Mean 80.23% Median 84.6% Standard deviation of 16.24 ppt. 5 th percentile is 53

Wrap Up• Four rules of probability distributions

1. P(not A) = 1 – P(A)2. P(A or B) = P(A) + P(B) when P(A and B)=03. P(A and B)=P(A) P(B given A)

Beware of the inverse probability fallacy, P(B given A) is not P(A given B)

3’. P(A and B)=P(A) P(B) when A and B are independent• Random variable

– Variable whose value has not been realized.• Probability distribution of a random variable

– List of the probabilities of the values of the random variable.

• Expected value of a random variable E(Y)=S yk P(Y=yk)– E(X+Y)=E(X)+E(Y), E(cX)= c E(X), E(X) = E(X|Z)

Page 25: Midterm 1 Well done !! Mean 80.23% Median 84.6% Standard deviation of 16.24 ppt. 5 th percentile is 53

Coming up: Readings:• Chapter 4 entirely – full of interesting examples and super relevant.• Online quiz on Thursday night.• No slide due on Thursday.

For help:

• Amine OuazadOffice 1135, Social Science [email protected] hour: Tuesday from 5 to 6.30pm.

• GAF: Irene [email protected] recitations. At the Academic Resource Center, Monday from 2 to 4pm.