st2334 probability and statistics academic year 2015/2016 semester ii

14
ST2334 Probability and Statistics Academic Year 2015/2016 Semester II David Chew Department of Statistics and Applied Probability email: [email protected] Typesetted using the MiKT E X sytem.

Upload: independent

Post on 21-Nov-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

ST2334

Probability and StatisticsAcademic Year 2015/2016

Semester II

David ChewDepartment of Statistics and Applied Probability

email: [email protected]

Typesetted using the MiKTEX sytem.

Course Information

AIMS & OBJECTIVES

This module teaches students some fundamental concepts of probability, suchas how to calculate the probability A will happen given the knowledge thatB has also happened, what random variables are and what it means for themto be dependent or independent of each other. Students will learn some com-mon probability distributions, how to estimate properties of a populationthat is only partially sampled and how to evaluate hypotheses about thatpopulation, under the assumption that the characteristic being measured fol-lows a Normal/Gaussian distribution.

SYLLABUS

Basic concepts of probability, conditional probability, independence, randomvariables, joint and marginal distributions, mean and variance, some com-mon probability distributions, sampling distributions, estimation and hy-pothesis testing based on a normal population.

REFERENCE

• Jessica M. Utts, Robert F. Heckard, Mind on Statistics, 5th ed. CengageLearning, 2015.

• Richard D. Dex Veaux, Paul F. Velleman, David E. Bock, Stats : data andmodels, 4th ed. Pearson Education Ltd, 2016.

LECTURER

Dr David ChewOffice: S16-06-108, Tel: 6516-5239email: [email protected]

Please indicate the subject ST2334 when you email me.

iii

iv Course Information

TEACHING MODES

Concerning Lectures

• On Weeks 01 to 06, we meet every Monday and Thursday from 1000 to1140 hrs in LT27.

• On Weeks 07 to 13, we will adopt a “flipped classroom” learning modelfor our classes. Lectures will be delivered online so students are ex-pected to watch video lectures before coming in for tutorial classesphysically.

– However, on 3 Mar, 17 Mar & 31 Mar, I will conduct “ExampleSessions” in LT27 where extra examples not covered in the videolectures will be discussed.

– We then meet in LT27 for revision lectures on 11 Apr and/or 14Apr to wrap up the course.

A word about WebCastsClasses conducted in LT27 will be recorded/WebCasted and made availableto all via IVLE. Students should utilise the lecture recordings in the rightmanner: as a means to revise materials that you find difficult after class, andnot as an excuse to skip lectures.

A word about QuestionSMSStudents may ask questions anonymously via the QuestionSMS system dur-ing lectures. To do so, SMS your messages to the number 77577 in the formatcode message. I will use the code ST2334 throughout the semester. Forexample, to ask “Why Example 1.5 is liddat?”, simply SMS

ST2334 Why Example 1.5 is liddat?

to the number 77577.

Concerning Tutorials

• On Weeks 03 to 12, every student will attend a 2-hour tutorial once afortnight.

– You will sign up for one of the ten tutorial classes available.

– Tutorial groups T01 to T05 meet on Weeks 03, 05, 07, 09 & 11, while

– Tutorial groups T06 to T10 meet on Weeks 04, 06, 08, 10 & 12.

• We hope to give some time during tutorials for group discussions sowe will divide each tutorial class into several discussion groups. Afterthe group discussions, one student from each group will then be askedto present a summary of their discussions to the whole class.

v

ASSESSMENT

Your final grade for the course will depend on

• tutorial participation (5%)

• several online quizzes (10%)

• a mid-semester examination (25%)— multiple choice questions and/or short questions, to be held on 7thMar 16 (Monday) OR 10th Mar 16 (Thursday), 1015 to 1115 hrs. Venue:to be announced.

• a 2-hour final examination (60%)— open ended questions, to be held on 25 Apr 16, Monday, 0900-1100hrs.

Both the mid-semester and final examinations are closed book but studentswill be allowed to bring along ONE handwritten A4 size help-sheet (two-sided).

You will note the low weightage given to the online quizzes and tutorial par-ticipation components. This is deliberate as the purposes of these compo-nents are to encourage you

• to keep up with the lecture progress (online quizzes), and

• to be enagaged in discussions during classes (tutorial participation).

GETTING HELP

• Ask your fellow students on the IVLE forum.

• E-mail me your question with the subject [ST2334].The replies might be published on the IVLE module FAQ so that ev-eryone can read them.

• Ask me questions during lecture breaks or at the end of lectures.

• Make an appointment to meet me during my consultation hours.Consultation will be held on Mondays and Thursdays 1400 – 1530 hrs,in my office S16-06-108.

One

The Moral of the Story . . .

“This reminds me of a story. Long before your time in theSouthern province of China . . . ” Tan Ah Teck

1.1 WHY LEARN STATISTICS & PROBABILITY?

Statistics gets no respect. People say things like “You can prove anythingwith Statistics.” People will write off a claim based on data as “just a sta-tistical trick.” The situation is not much better for the study of Probability.So why should you spend your time learning about subjects that sound asdull as these? With the following examples, we hope to convince you thatlearning about these subjects will be interesting and useful.

EXAMPLE 1.1 (BIRTHDAY PROBLEM I)Ignoring the issues of leap years, twins and seasonal variation in fer-tility, what is the chance that in a room of 25 people there are at leasttwo individuals who share a common birthday?

(a) Less than 1 in a 1000.

(b) At least 1 in 1000 and less than 1 in 100.

(c) At least 1 in 100 and less than 1 in 10.

(d) At least 1 in 10 and less than 1 in 2.

(e) At least 1 in 2.

The probability for that to happen is about 0.57. Does that surpriseyou?

1

2 Chapter 1. The Moral of the Story . . .

EXAMPLE 1.2 (BIRTHDAY PROBLEM II)How many people must you ask in order to have a 50 : 50 chance offinding someone who shares your birthday? A number

(a) Less than 25,

(b) Between 25 and 49,

(c) Between 50 and 99,

(d) Between 100 and 199,

(e) At least 200.

Many of you will think that the answer is (b) or (d). And you willbe wrong! This time round the surprise is that we need much morepeople than the reasonable figure of 182 (half of 365). The answer is253.

EXAMPLE 1.3 (THE MONTY HALL PROBLEM)Suppose you’re on a game show, and you’re given the choice of threedoors: Behind one door is a car; behind the others, goats.

You pick a door, say No. 1, and the host, who knows what’s behindthe doors, opens another door, say No. 3, which has a goat. He thensays to you, “Do you want to pick door No. 2?”

Is it to your advantage to switch your choice?

1.1. Why Learn Statistics & Probability? 3

Since door 3 is now opened, the probability that the car is behinddoor 1 should be 0.5, you argue. So it does not matter whether youswitch or not. And you will be in good company, but nonethelesswrong. The fact is that you win 2/3 of the time when you switch.

EXAMPLE 1.4 (AUTISM TESTS)British scientists have developed a 15-minute brain scan they hopecould be used to detect autism in children. Autism and related disor-ders affect up to seven out of every 1,000 individuals. The brain scanmethod was 90 percent accurate in correctly identifying the autisticpatients. It also showed a negative result for healthy controls in 80percent of cases. If a child tested positive for autism, what is theprobability that he indeed has autism?

(a) Less than 10%.

(b) At least 10% and less than 30%.

(c) At least 30% and less than 70%.

(d) At least 70% and less than 90%.

(e) At least 90%.

You look at the accuracy of the autism test (90 percent; 80 percent)and conclude that the probability should be quite high. Surprisingly,the answer is actually rather low, about 0.03, in fact.

THE MORAL OF THE STORY

Examples 1.1 to 1.4 show that our intuitions about uncertainty can often bemisleading. Perhaps we should do some probability after all . . .

EXAMPLE 1.5 (WHO ARE THOSE SPEEDY DRIVERS?)A survey taken in a large class at Penn State University contained thequestion “What’s the fastest (in mph) you have ever driven a car?”The data provided by the 87 males and 102 females who respondedare listed here.Males:110 109 90 140 105 150 120 110 110 90 115 95 145 140110 105 85 95 100 115 124 95 100 125 140 85 120 115105 125 102 85 120 110 120 115 94 125 80 85 140 12092 130 125 110 90 110 110 95 95 110 105 80 100 110130 105 105 120 90 100 105 100 120 100 100 80 100120 105 60 125 120 100 115 95 110 101 80 112 120 110115 125 55 90

4 Chapter 1. The Moral of the Story . . .

Females:80 75 83 80 100 100 90 75 95 85 90 85 90 90 120 85100 120 75 85 80 70 85 110 85 75 105 95 75 70 90 7082 85 100 90 75 90 110 80 80 110 110 95 75 130 95 110110 80 90 105 90 110 75 100 90 110 85 90 80 80 85 5080 100 80 80 80 95 100 90 100 95 80 80 50 88 90 90 8570 90 30 85 85 87 85 90 85 75 90 102 80 100 95 110 8095 90 80 90

From these numbers, can you tell which sex tends to have drivenfaster and by how much?

What if I now show you the following?

The plot above is known as a dotplot, where each dot represents theresponse of an individual student.

Another summary:

The table above displays the five-number summary for each sex: thelowest value, the cutoff points for 1/4, 1/2, and 3/4 of the data, and thehighest value.

THE MORAL OF THE STORY

Simple summaries of data can tell an interesting story and are easier todigest than long lists.

1.1. Why Learn Statistics & Probability? 5

EXAMPLE 1.6 (DID ANYONE ASK WHOM YOU’VE BEEN DATING?)

Consider the following newspaper headlines:

“According to a new USA Today/Gallup Poll of teenagersacross the country, 57 percent of teens who go out on datessay they‘ve been out with someone of another race or eth-nic group.” (Peterson, 1997)

This prompts the Sacramento Bee to proclaim:

“Interracial dates common among today‘s teenagers.”

There are millions of teenagers in U.S. Did the polltakers ask all ofthem? No.The article states that

“the results of the new poll of 602 teens, conducted Oct13 – 20, reflect the ubiquity of interracial dating today . . . ”

So the pollsters asked only 602 teens.

Could such a small sample tell us anything about the millions ofteenagers in the U.S.? Yes . . .

if those teens constituted a random sample from the population.

How accurate could this sample be? Apparently, the margin of erroris about 5%. This means that the percent of all teenagers in the USwho date that would say they have dated interracially is likely to bein the range 57%±5%, or between 52% and 62%.

THE MORAL OF THE STORY

A representative sample of only a few thousand, or perhaps even a fewhundred, can give reasonably accurate information about a populationof many millions.

6 Chapter 1. The Moral of the Story . . .

EXAMPLE 1.7 (WHO ARE THOSE ANGRY WOMEN?)Shere Hite sent questionnaires to 100,000 women asking about love,sex, and relationships. Only 4.5% of the women responded, and Hiteused those responses to write her book, Women and Love.

As Moore notes,

“The women who responded were fed up with menand eager to fight them. For example, 91% of those whowere divorced said that they had initiated the divorce. Theanger of women toward men became the theme of the book.”

Moore (1997, p. 11)

The Hite sample exemplifies one of the most common problems withsurveys: The sample data may not represent the population. Exten-sive nonparticipation (i.e. nonresponse) from a random sample, orthe use of a self-selected (i.e., all-volunteer) sample, will probablyproduce biased results.

THE MORAL OF THE STORY

An unrepresentative sample, even a large one, tells you almost nothingabout the population.

EXAMPLE 1.8 (DOES PRAYER LOWER BLOOD PRESSURE?)A headline in USA Today read,

“Prayer can lower blood pressure” (Davis, 1998)

The story that followed states, “Attending religious services lowersblood pressure more than tuning into religious TV or radio, a newstudy says.”The report is based on a observational study, which followed 2391people for 6 years.

“People who attended a religious service once a weekand prayed or studied the Bible once a day were 40% lesslikely to have high blood pressure than those who don‘tgo to church every week and prayed and studied the Bibleless.”

Researchers did observe a relationship, but it‘s a mistake to concludeprayer actually causes lower blood pressure.

In observational studies, groups can differ by important ways thatmay contribute to the observed relationship. People who attendedchurch regularly may have

1.1. Why Learn Statistics & Probability? 7

• been less likely to smoke or drink alcohol;

• had a better social network;

• been somewhat healthier and able to go to church.

These other factors are possible confounding variables.

THE MORAL OF THE STORY

Cause-and-effect conclusions cannot generally be made on the basis ofan observational study.

EXAMPLE 1.9 (DOES ASPIRIN REDUCE HEART ATTACK RATES?)Consider the Physician‘s Health Study (1988), a 5-year randomizedexperiment. The purpose of the experiment was to determine whethertaking aspirin reduces the risk of a heart attack.

• 22,071 male physicians of age 40 – 84;

• randomly assigned to one of two treatment groups;

• Group 1 = aspirin every other day; Group 2 = placebo;

• Physicians blinded as to which group they were in.

The results support the conclusion that taking aspirin does indeedhelp to reduce the risk of having a heart attack:

Because the men in this experiment were randomly assigned to thetwo conditions, other important risk factors such as age, amount ofexercise, and dietary habits should have been similar for the twogroups. This makes it possible to conclude that taking aspirin ac-tually caused the lower rate of heart attacks for that group.

In a later chapter, we will learn how to determine that the differenceseen in this sample is statistically significant.

THE MORAL OF THE STORY

Cause-and-effect conclusions can generally be made on the basis of ran-domized experiments.

8 Chapter 1. The Moral of the Story . . .

The above stories were meant to bring life to our definition of statistics andhow it is related to probability.

Statistics is a collection of procedures and principles for gath-ering data and analyzing information to help people make deci-sions when faced with uncertainty. Probability helps us quantifyrandomness/uncertainty.

Think back over the stories. In every story, data are used to make a judgmentabout a situation. This common theme is what statistics is all about.