extra exercises basic statistics: exercise 11 extra exercises basic statistics: exercise 1: data...

12
1 Extra Exercises Basic Statistics: Exercise 1: Data results.txt (see Results on web-page: Goodyear ) This dataset contains the results of the students from the following study disciplines some years ago: Chemistry, Biology and Geography. The variables are as follows: length : length of students in cm; gender : Male (M) of Female (F); high_school : study results of the last year in high school (in percentages); bachelor : study results of the first bachelor year (in percentages); study_direction : Chemistry (Ch), Biology (B) or Geography (G); color : preferable color of the car Light (L), Dark (D) or Red (R). Check if the bachelor score is significantly higher than the high school score.

Upload: others

Post on 22-Mar-2020

34 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Extra Exercises Basic Statistics: Exercise 11 Extra Exercises Basic Statistics: Exercise 1: Data results.txt (see Results on web-page: Goodyear) This dataset contains the results of

1

Extra Exercises Basic Statistics:

Exercise 1:

Data results.txt (see Results on web-page: Goodyear)

This dataset contains the results of the students from the following study

disciplines some years ago: Chemistry, Biology and Geography. The

variables are as follows:

length : length of students in cm;

gender : Male (M) of Female (F);

high_school : study results of the last year in high school (in

percentages);

bachelor : study results of the first bachelor year (in percentages);

study_direction : Chemistry (Ch), Biology (B) or Geography (G);

color : preferable color of the car Light (L), Dark (D) or Red (R).

Check if the bachelor score is significantly higher than the high school score.

Page 2: Extra Exercises Basic Statistics: Exercise 11 Extra Exercises Basic Statistics: Exercise 1: Data results.txt (see Results on web-page: Goodyear) This dataset contains the results of

2

Exercise 2:

Data chol.txt

The data contains information about the cholesterol level of 200 persons.

AGE : age of a person;

HEIGHT : height of a body;

WEIGHT : weight of a body;

CHOL : cholesterol level;

SMOKE : nosmo/pipe/sigare;

BLOOD : blood group a/ab/b/o

MORT : alive/dead

and other variables.

a. Make a box-plot of the cholesterol level of the smokers- and non-

smokers groups.

b. Check if the average cholesterol level of the smokers is significantly

different from that of the non-smokers:

H0: µsm = µnon-sm vs. H1: µsm ≠ µnon-sm

Page 3: Extra Exercises Basic Statistics: Exercise 11 Extra Exercises Basic Statistics: Exercise 1: Data results.txt (see Results on web-page: Goodyear) This dataset contains the results of

3

Solution:

Exercise 1:

H0: µbach = µh_sch vs. H1: µbach > µh_sch

Step 1:

Create a new variable D=bachler-high_school

Step 2: Reformulate the hypothesis

H0: µD = 0 vs. H1: µD > 0

Step 3: Check the normality of the variable D:

On the data window:

Analyze Distribution: Select Y, Columns: D

On D menu: Continuous Fit Normal

On D menu: Normal Quantile plot

On Fitted Normal menu: Goodness of Fit

Page 4: Extra Exercises Basic Statistics: Exercise 11 Extra Exercises Basic Statistics: Exercise 1: Data results.txt (see Results on web-page: Goodyear) This dataset contains the results of

4

On the histogram and Q-Q plot we do not see a departure from normality.

From the Shapiro-Wilk test we obtain p=0.7028 > 0.05. Hence, we do not

have a reason to reject normality.

Step 4:

Since the data is normal, we can apply a one sample t-test to test the

significance of the mean:

On the data window:

Analyze Matched paires

Page 5: Extra Exercises Basic Statistics: Exercise 11 Extra Exercises Basic Statistics: Exercise 1: Data results.txt (see Results on web-page: Goodyear) This dataset contains the results of

5

JMP took high_school-bachler difference. Hence, we reformulate H1 as

follows:

H1: µh_sch < µbach or, equivalently, H1: µD < 0 .

Page 6: Extra Exercises Basic Statistics: Exercise 11 Extra Exercises Basic Statistics: Exercise 1: Data results.txt (see Results on web-page: Goodyear) This dataset contains the results of

6

Then, the corresponding p-value will be “Prob<t” and equal 1.00. So, we will

not reject H0.

Remark: the same result could be obtained if you apply a sample test based

on the difference D. Here, we will test H0: µD = 0 vs. H1: µD > 0, as it was

formulated.

On the Distribution window:

In D menu: Test Mean

Page 7: Extra Exercises Basic Statistics: Exercise 11 Extra Exercises Basic Statistics: Exercise 1: Data results.txt (see Results on web-page: Goodyear) This dataset contains the results of

7

Exercise 2:

(a. )

Step 1: Create a variable Sm_Status with 2 levels: Smoker/Non-Smoker:

Make a new column Sm_Status .

On the variable window:

Column Properties Formula

Edit Formula

On the formula window:

Functions (grouped) Conditional : Select: Match

Table Columns: Select: Smoke

Bottom: ^ (= insert)

Page 8: Extra Exercises Basic Statistics: Exercise 11 Extra Exercises Basic Statistics: Exercise 1: Data results.txt (see Results on web-page: Goodyear) This dataset contains the results of

8

Make a grouped Box plot:

Analyze Fit Y by X: Y, Response: Chol; X, Factor: Sm_Status

In the Oneway menu: Display Options: Box Plots

Page 9: Extra Exercises Basic Statistics: Exercise 11 Extra Exercises Basic Statistics: Exercise 1: Data results.txt (see Results on web-page: Goodyear) This dataset contains the results of

9

(b.)

Step 1:

Make a bar plot to get an idea of the sample sizes:

Graph Chart: Statistics: N(Sm_Status)

In the data we have more than 40 (=49, from the data) non-smokers and

more than 150 (= 151, from the data) smokers:

Sample sizes are larger than 30, but their difference is also large;

In this case, if the distributions are skewed, then a t-test is not suited

for a mean comparison. Hence, we will check normality.

Page 10: Extra Exercises Basic Statistics: Exercise 11 Extra Exercises Basic Statistics: Exercise 1: Data results.txt (see Results on web-page: Goodyear) This dataset contains the results of

10

Step 2: H0: µsm = µnon-sm vs. H1: µsm ≠ µnon-sm

First, split the column:

Tables Split: Split Columns: Chol; Split By: Sm_Status

Then, test normality :

The group of the smokers is skewed to the right. We will try to transform

the data to improve the normality. Try a square root transformation:

Page 11: Extra Exercises Basic Statistics: Exercise 11 Extra Exercises Basic Statistics: Exercise 1: Data results.txt (see Results on web-page: Goodyear) This dataset contains the results of

11

The normality is satisficatory. Hence, we will apply a t-test on the

transformed data.

Step 3: Transform Chol data to sqrt(Chol): Sqrt_Chol.

Now, we will test the following:

H0: µsqrt_sm = µsqrt_non-sm vs. H1: µsqrt_sm ≠ µsqrt_non-sm.

We can reach a conclusion only about the equality of the means of the

transformed measurements of the samples.

Step 4: Check the equality of the variances.

H0: σsqrt_sm = σsqrt_non-sm

Analyze Fit Y by X

Page 12: Extra Exercises Basic Statistics: Exercise 11 Extra Exercises Basic Statistics: Exercise 1: Data results.txt (see Results on web-page: Goodyear) This dataset contains the results of

12

In Oneway window: Unequal Variances

p=0.0126 < 0.05 we reject the equality of the variances. We will apply a

t-test for unequal variances.

Step 5: t-test for unequal variances

p=0.1337 > 0.05, we will not reject H0: µsqrt_sm = µsqrt_non-sm . The square root

transformed cholesterol measurements of the groups of smokers and non-

smokers are not significantly different.