lab session 1 - cengage learning

114
Technology Guide for Elementary Statistics 11e: Minitab CHAPTER 1 - LAB SESSION INTRODUCTION TO MINITAB INTRODUCTION: This lab session is designed to introduce you to the statistical software MINITAB. During this session you will learn how to enter and exit MINITAB, how to enter data and commands, how to print information, and how to save your work for use in subsequent sessions. As with any new skill, using this software will require practice and patience. BEGINNING AND ENDING A MINITAB SESSION To start MINITAB From the taskbar, choose Start > Programs > MINITAB 14 > MINITAB. To exit MINITAB To end a MINITAB session and exit the program, choose File from the menu bar and then choose Exit. A dialog box will appear, asking if you want to save the changes made to this worksheet. Click Yes or No. It is also possible to exit MINITAB by clicking the X in the upper right corner of the window.

Upload: others

Post on 10-Feb-2022

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

CHAPTER 1 - LAB SESSION INTRODUCTION TO MINITAB

INTRODUCTION: This lab session is designed to introduce you to the statistical software MINITAB. During this session you will learn how to enter and exit MINITAB, how to enter data and commands, how to print information, and how to save your work for use in subsequent sessions. As with any new skill, using this software will require practice and patience. BEGINNING AND ENDING A MINITAB SESSION

To start MINITAB From the taskbar, choose Start > Programs > MINITAB 14 > MINITAB.

To exit MINITAB To end a MINITAB session and exit the program, choose File from the menu bar and then choose Exit. A dialog box will appear, asking if you want to save the changes made to this worksheet. Click Yes or No. It is also possible to exit MINITAB by clicking the X in the upper right corner of the window.

Page 2: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

In MINITAB, there are three ways to access commands: with menus, the Toolbar, and session commands. The Toolbar is a quick way to issue commands. When you click a button, MINITAB performs an action or opens a dialog box, exactly like the corresponding menu command. To be able to use session commands we must enable the command language editor. To do this, choose Editor > Enable Command Language. Session commands are alternatives to menu commands that you can type in the session window or in the command line editor.

MINITAB WINDOWS

The main MINITAB window opens when you first start MINITAB. You will be in a window titled “MINITAB - Untitled ” within which a split window is shown; one titled “Session” and the other titled “Worksheet 1”. The Session window displays text output such as tables of statistics. Data windows are where you enter, edit, and view the column data for each worksheet. Another window in the MINITAB environment that can be accessed through the Window menu is the Project Manager. The Project Manager summarizes each open worksheet. Within the Project manager, the History window records all the commands you have used. Graph windows display graphs.

Session Window

The data window is active when you first start MINITAB. To move to the Session Window just point the mouse to the Session Window and click. In older versions of Minitab, whenever you issue a command from a menu, its corresponding Session command appears in the Session window. In version 14, the command will appear in the History folder within the Project Manager and will only appear in the session window if you have enabled the command language. You can also type Session commands directly into the Session window at the MTB> prompt. Throughout these labs, the same typographical conventions will be used as in Johnson/Kuby’s Elementary Statistics, 11/e.

The Help Window in MINITAB

Information about MINITAB is stored in the computer. If you forget how to use a command or subcommand, or need general information, you can ask MINITAB for help. There are three methods for accessing Help: choose Help from the menu, select “?” from the toolbar, or press F1. It would be beneficial for you to read “How to use Minitab Help” the first time you enter the program to help you understand the structures used in Minitab.

Page 3: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

Students: Practice using the HELP command by typing the following and reading what is presented on the screen:

Menu Commands Choose: Help > Help

Select: Index Help on Enter: MEAN

The Data Window

Close Help and click in the worksheet.

The worksheet is arranged by rows and columns. The columns C1, C2, C3, . . . , correspond to the variables in your data, the rows to observations. In general, a column contains all the data for one variable, and a row contains all the data for an individual subject or observation. You can refer to the columns as C1, C2, or by giving them descriptive names. Click into the column

Page 4: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

name cell (the blank space below the column number). Name column 2 “Test 1”, column 3 “test 2”, column 4 “test 3” and column 5 “Average” ENTERING DATA

Now that we are in the data window, let's enter data in the second column:

78 94 93 81 75 62 58 50 80 79 To do this press the down arrow key (¯ ) or Enter to move to the next entry position.

Suppose we wish to create a column that contains the integers 1 to 10. Although we could enter these numbers directly into the Data window by typing, there is a much easier way:

Menu commands Choose:Calc > Make Patterned Data > Simple Set of Numbers

Enter: Store patterned data in : C1 from first value: 1

to last value: 10 Click: OK NOTE: Use the Tab key to cycle through the prompts in the dialog box.

Page 5: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

This is what the menu choices look like:

Column 1 should now contain the integers 1 through 10. While you are in the data window, fill columns 3 and 4 with a set of ten test scores each. You should now have four columns of data.

Changing a value entered

We can edit data directly in the data window. Let's suppose we had incorrectly entered the third data item in the second column. It should have been a 73. Click cell C2 row 3 to make it active. Type in the correct value and press enter. Double-clicking allows insertion of new characters without retyping the entire entry.

Suppose we had inadvertently left out a value and we wish to enter it in a particular position. Place the cursor in the cell in which you wish to insert the new value. Click the Insert Cells button on the taskbar. A blank cell is created and the missing value can be entered.

Page 6: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

A cell can be deleted by making the cell active, then Choose: Edit > Delete Cells (or press the Del key). Rows of values can also be inserted or deleted in a similar manner. The menu command to insert a row is only functional when the data window is active, and a row is active. To make a row active , click the row header (ie. the row number). An empty row will be added above the active row in the Data window and the remaining rows will be moved down.

Menu Commands

Choose: Editor > Insert Row To print your data choose File > Print Worksheet, make the appropriate selections and click OK Suppose we wish to copy a column into another column. We can use the COPY command instead of reentering the data. Choose: Data> Copy> Copy columns to columns

Enter: Copy from columns: TEST1 Select Store Copied Data in columns (choose from drop down arrow to select Column) Click: OK

To erase an entire column we use the ERASE command.

Menu Commands Choose: Data > Erase Variables Enter: Columns and constants: select appropriate variable Click: OK

SAVING YOUR WORK A MINITAB project contains all of your work; the data, text output from the

commands, graphs, and more. When you save a project, you save all of your work at once. When you open a project, you can pick up right where you left off. The project’s many pieces can be handled individually. You can create data, graphs, and output from within MINITAB. You can also add data and graphs to the project by copying them from files. The contents of most windows can be saved and printed separately from the project, in a variety of formats. You can also discard a worksheet or graph, which removes the item from the project without saving it. Let’s save the project and name it “Intro”. Be sure to note where you are saving it.

Page 7: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

To open, save, or close a project

To open a new project, choose File > New, click Project, and click OK. To open a saved project, choose File > Open Project. To save a project, choose FILE > Save Project.

To close a project, you must open a new project, open a saved project, or exit MINITAB.

RETRIEVING A FILE To retrieve the project that we had saved in the previous session:

Menu Commands Choose: File > Open Project Click: Look in drop-down list arrow Locate the file Double-click: INTRO.MPJ Click: OPEN

The data window now displays the test data you saved previously.

A CD ROM accompanies Johnson/Kuby’s Elementary Statistics, 11/e. Follow the instructions that accompany the disk for use on your computer.

Page 8: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

ASSIGNMENT: 1. Create a data file on your disk that consists of the heights of 15 of your classmates

(in column 1) and their weights (in column 2).

2. Retrieve the data file created in #1 above, and produce a paper copy (commonly called 'hard-copy') to hand in.

3. Retrieve the data file for Exercise 2.23 from the Student’s Suite CD-ROM that

accompanies the Johnson/Kuby text, and print a hard copy to hand in.

Page 9: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

CHAPTER 2 - LAB SESSION 1 GRAPHIC PRESENTATION OF UNIVARIATE DATA

INTRODUCTION: Graphically representing data is one of the most helpful ways to become acquainted with the sample data. In this lab you will use MINITAB to present data graphically. You will be analyzing data using six types of graphs: Pie Charts, Pareto diagrams, Dot plots, Stem-and-leaf displays, Histograms, and cumulative (relative) frequency plots (Ogives). GRAPHIC PRESENTATIONS OF DATA There are several ways to display a picture of the data. These graphical displays help us get acquainted with the data and to begin to get a feel for how the data is distributed. To see what is available to you, use the menu bar to select Graph. Note the different types of graphs that are listed there. We will use the menu bar to make our selections.

PIE CHARTS Circle graphs (pie diagrams) show the amount of data that belong to each category as a proportional part of a circle. Consider Example 2.1. We are instructed to construct a pie chart, with data presented as a frequency distribution. Enter the data into the sheet.

Menu Commands Choose: Graph > Pie Chart Click: Chart value from a table Enter: Categorical variable : C1 Summary variables : C2 Select: Labels > Title/Footnotes Enter: Household populations Select: Slice Labels > select desired labels OK OK

Page 10: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

The chart will come up in a new sheet.

PARETO DIAGRAMS In attempting to get a pictorial representation of data, we must decide what type of graphic display would best present the data and their distribution. Consider Exercise 2.11. We are instructed to construct a Pareto diagram in this instance since this a quality control application. In constructing a Pareto diagram for Exercise 2.11, we must enter words, called text data, in column 1 to indicate the category of the defect, and the corresponding frequency for each defect in column 2.

After entering the data in the worksheet and naming the columns, use the following commands to construct the Pareto diagram:

Page 11: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

Menu Commands Choose: Stat > Quality Tools > Pareto Chart Click: Chart defects table Enter: Labels in: C1 Frequencies in : C2 Combine all defects after 99 % into one bar Options: Title: Clothing Defects Reminder: Use the Tab key to cycle through the prompts in the dialog box. OK OK

Page 12: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

As you have already seen, each time you make a selection, MINITAB displays the corresponding command in the History folder. The History folder provides a permanent record of all the commands issued within the project. Now look at the last three commands that appear before the Pareto diagram. The display includes a subcommand, which provides additional information about the preceding command. Subcommands represent the options you select in dialog boxes. DOT PLOTS Dot plots are a quick and efficient way to get a preliminary understanding of the distribution of your data. It results in a picture of the data as well as sorts the data into numerical order. Enter the data for Exercise 2.19 into column 3 of the current worksheet, and name the column PtsScrd. Construct a dotplot of the data listed in C3: Menu Commands Choose: Graph > Dotplot Enter: Variables: PtsScrd Select: One Y/Simple Click: OK

Page 13: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

Situations arise in which we wish to compare data from different populations. This can be accomplished by doing what is called a “side-by-side” dotplot. Consider Exercise 2.184. Retrieve the data from the Student’s Suite CD-ROM (EX02-184). The commands to construct the multiple dotplot are as follows: Menu Commands

Choose: Graph> Dotplot Select: Simple Click; Multiple graphs Choose: In separate panels of the same graph Graph: C1 C2 Click: OK

The menu selections look as follows:

Page 14: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

The dot plot appears in a new window.

STEM AND LEAF DISPLAY To illustrate the commands necessary to construct a stem-and-leaf display, let's use the data in column 3 (points scored) from the previous worksheet. Menu Commands Choose: Graph > Stem-and-Leaf Enter: Variable: C3 Uncheck “Trim Outliers” Click OK

Page 15: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

The stem and leaf chart appears in the session window. Which part of the diagram is the stem? the leaf? What do the numbers to the left indicate? Try doing a stem-and-leaf for this data choosing various increment values. Notice, originally we did not specify an increment. What was MINITAB's response? How does the diagram change for the different increments you chose? Which is more informative? HISTOGRAMS Histograms are more useful for large sets of data. We expect the histogram of a sample to be similar to that of the population. To illustrate the many options under the HISTOGRAM command, let's use the data in Exercise 2.39 (on the Student’s Suite CD). The HISTOGRAM command separates the data into intervals on the x-axis and draws a bar for each interval whose height, by default, is the number of observations (or frequency) in the interval. Menu Commands

Choose: Graph > Histogram Choose: Simple OK Enter: Graph Variables: C1 Click: Scale Select the “Y-scale Type” tab Choose: Frequency or Percent

Page 16: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

GolfScor

Freq

uenc

y

8178757269

30

25

20

15

10

5

0

Histogram of GolfScor

With the most basic of histograms you do not get the detail necessary for a proper interpretation of the data. To get the following enhanced histogram, under “Labels” click “Show data labels” and select “Use y-value data labels” radio button. Click OK. OK

GolfScor

Freq

uenc

y

8178757269

30

25

20

15

10

5

01111

4

999

17

28

17

22

10

5

3

8

1

Histogram of GolfScor

Experiment with the many options within the HISTOGRAM command. Which of the options give you a clearer representation of the relationships within the data?

Page 17: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

OGIVES To construct an ogive, the class boundaries must be in listed in C1 and the cumulative percentages listed in C2. Let's use Exercise 2.55 in your text as an example. The data is presented as a grouped frequency distribution. You need this same information presented as a cumulative relative frequency distribution: Class Cumulative Boundaries Frequencies Relative Frequency 0 <= x < 4 4 4/50, or 0.08 4 <= x < 8 8 12/50, or 0.24 8 <= x < 12 8 20/50, or 0.40 12 <= x < 16 20 40/50, or 0.80 16 <= x < 20 6 46/50, or 0.92 20 <= x < 24 3 49/50, or 0.98 24 <= x <= 28 1 50/50, or 1.00

In a new worksheet, enter the class boundaries in C1 and the cumulative percentages in C2. Be sure to enter 0 for the percent paired with the lower boundary of the first class and pair each cumulative percentage with the class upper boundary. To plot an ogive: Menu commands Choose: Graph > Scatterplot Choose: With Connect Line Click: OK Choose: Y- variables.: C2

X-variables: C1 Click “Labels” Under Title enter: KSW TEST SCORES OK

Page 18: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

C1

C2

302520151050

1.0

0.8

0.6

0.4

0.2

0.0

KSW TEST SCORES

ASSIGNMENT: Do Exercises 2.7, 2.19, 2.43, 2.48, 2.54 in your text.

Page 19: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

CHAPTER 2 - LAB SESSION 2 NUMERICAL PRESENTATION OF UNIVARIATE DATA

INTRODUCTION: The basic idea of descriptive statistics is to describe a set of data in a variety of abbreviated ways. In this lab you will investigate measures of central tendency and dispersion. The box-and-whiskers display, a graphical display of the 5-number summary of a set of data, will also be introduced. MEASURES OF CENTRAL TENDENCY AND DISPERSION Measures of central tendency and variation are the foundation of descriptive statistics but most of these formulas are quite tedious to compute, even with a calculator. Fortunately, we can find a number of commonly used descriptive statistics using just a single command. Enter the data in Exercise 2.76 into C1. Get a dot-plot of your data and visually approximate the "center".

C11110987654

Dotplot of C1

Calculate the mean and median using the following commands. Menu Commands Choose: Calc > Column Statistics Select: Median Enter: Input variable: C1 Click: OK

Page 20: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

Median of C1 Median of C1 = 7

Choose: Calc > Column Statistics Select: Mean Click: OK Mean of C1 Mean of C1 = 6.93333

If you are interested in a variety of statistics, including median and mean, these values can be found more easily using the following: Menu Commands Choose: Stat > Basic Statistics > Display Descriptive Statistics Enter: Variables: C1 Click: OK The output appears in the session window. Descriptive Statistics: C1 Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3 Maximum C1 15 0 6.933 0.452 1.751 4.000 6.000 7.000 8.000 11.000

We can also find values by entering a formula and storing the result in a column. For example, the midrange = (high + low)/2. To do this in MINITAB we would do the following: Select: Calc>Calculator Store Result In: midrange Type in the expression: ( MAX(C1) + MIN(C1) ) /2 There is now a new column in the worksheet named midrange, which contains the results of the expression.

Midrange 7.5

Page 21: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

Visually locate the three calculated centers on the dot plot. Notice the three measures of central tendency are approximately the same. How well did you visually approximate the center? Now, place the values of C1 plus 4 into column C2, do a dot plot, visually locate the 'center', then determine the mean, median and midrange.

Select: Calc>Calculator Store Result In: C2 Type in the expression: C1 + 4

C215141312111098

Dotplot of C2

Descriptive Statistics: C2 Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3 C2 15 0 10.933 0.452 1.751 8.000 10.000 11.000 12.000 Variable Maximum C2 15.000

midrange 11.5 How did the three measures of central tendency (mean, median, and midrange) change?

Page 22: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

Next, place the values of C1 times 3 into C3, and follow the procedure above.

C33330272421181512

Dotplot of C3

Descriptive Statistics: C3 Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3 Maximum C3 15 0 20.80 1.36 5.25 12.00 18.00 21.00 24.00 33.00

midrange 22.5 Compare the three measures of central tendency for the columns of data C1, C2 and C3. How and why did a change in the measures occur? If a different transformation was performed (such as dividing each entry in C1 by 2) could you make an educated guess about the effect on these three measures? Consider Applied Example 2.11 in the text. Retrieve the data and do a dot-plot and calculate the mean, median and midrange. What is there about the distribution of these ten data values that causes these three averages to be so different?

Page 23: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

AnnIncom52000480004400040000360003200028000

Dotplot of AnnIncom

Descriptive Statistics: AnnIncom Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3 Maximum AnnIncom 10 0 35400 2413 7631 25500 31500 33375 37875 54000

midrange 39750 Compare the standard deviations for each of the previous four examples, along with how similar or how different the three measures of central tendency were. Can we use the standard deviation to predict whether we expect these three measures of central tendency to be quite similar or quite different? FREQUENCY DISTRIBUTIONS When the sample data are in the form of a frequency distribution, we can still use MINITAB to describe the distribution. The class marks need to be listed in one column with the corresponding frequencies in another. Start a new worksheet (Choose: File > New > Worksheet), and enter the following information, where X represents the number of radios in a household and Freq is the number of households having X radios:

Page 24: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

X Freq 1 20 2 35 3 100 4 90 5 65 6 40 7 5 Name C1 as X, C2 as Freq. Create C3 to be xf and C4 to be x2f by using the calculator.

Determine the mean using the following expression: SUM(C3)/SUM(C2)

mean 3.80282 Use the calculator to evaluate each of the following expression:

SUM(C4)-(SUM(C3)*SUM(C3))/SUM(C2)

(SUM(C4)-(SUM(C3)*SUM(C3))/SUM(C2))/(SUM(C2)-1)

SQRT((SUM(C4)-(SUM(C3)*SUM(C3))/SUM(C2))/(SUM(C2)-1)) sum of x^2 variance stddev 676.197 1.91016 1.38209 *Reminder: in the case of a grouped frequency distribution enter the class marks in one column and the corresponding frequencies in another. BOX-AND-WHISKER DISPLAY The boxplot (MINITAB's name for the box-and-whisker display) is a simple graph that gives a graphic 5-number summary. Information about the center, dispersion, and skewness of a data set will be illustrated. We will use the data from Ex 2.184 from Lab 1. Menu Commands Choose: Graph > Boxplot Choose: One Y, simple Click: OK Enter: Graph variables: Atmospheric, (then Chemical) Click OK

Page 25: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

Page 26: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

A rectangle is constructed between the two quartiles, with a line across the box indicating the location of the median. The box encloses the middle half of the data. The whiskers extend in either direction to indicate the maximum and minimum values. The BOXPLOT command can also be used to produce a “side-by-side” boxplot, for comparison among the variables. Menu Commands Choose: Graph > Boxplot Choose: Mulitple Y’s, Simple Enter: Graph variables: Atmospheric, Chemical Click: OK

Page 27: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

Consider again the salary data presented in Application 2 - 2. Retrieve the data from the Student Suite CD and perform a BoxPlot of the data in column A.

Ann

Inco

m

55000

50000

45000

40000

35000

30000

25000

Annual Income

The asterisk ( * ) included in the boxplot indicates an outlier- a data value that is far removed from the rest of the data.

ASSIGNMENT: Do Exercises 2.76, 2.118, 2.125, 2.126 and 2.128 in your text.

Page 28: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

CHAPTER 3 -LAB SESSION 1 PRESENTATION OF BIVARIATE DATA

INTRODUCTION: It is frequently interesting to view the relationship of two variables. In this lab we will see how MINITAB can help us plot bivariate data and discover some trends in the relationship. We can set up the data as ordered pairs, with the independent variable as the x and the dependent variable as the y. TABULAR PRESENTATION OF BIVARIATE DATA We can arrange the data resulting from two qualitative variables in a cross tabulation or contingency table. These tables often show relative frequencies (percentages) that can be based on the entire sample, or on the subsample classification (either a row or a column). Let’s use the data in the Highway Speed Limits table in Exercise 3.6. Retrieve the data (EX03-006). Note the data is arranged as follows: Column C1 is titled State. Column C2 is titled Cars, and column C3 is titled Trucks. To construct a cross-tabulation table of the two variables, vehicle type and maximum speed limit: Menu Commands Choose: Stat > Tables > Cross Tabulation and Chi-Square Enter: Categorical Variables: For Rows: C2 For Columns: C3

Select: Counts

Click: OK

Page 29: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

Results for: Ex03-06.MTW Tabulated statistics: Cars, Trucks Rows: Cars Columns: Trucks 55 60 65 70 75 All 55 1 0 0 0 0 1 65 2 1 17 0 0 20 70 2 1 1 12 0 16 75 0 0 1 1 11 13 All 5 2 19 13 11 50 Cell Contents: Count

Now let’s do the same thing, only this time Select the total percent box. Tabulated statistics: Cars, Trucks Rows: Cars Columns: Trucks 55 60 65 70 75 All 55 2 0 0 0 0 2 65 4 2 34 0 0 40 70 4 2 2 24 0 32 75 0 0 2 2 22 26 All 10 4 38 26 22 100 Cell Contents: % of Total

This table would be useful in answering questions such as: What percent of the states have a maximum speed limit of 75 for both cars and trucks? (22%) What percent of the states have different maximums for cars and trucks? (4% + 2% + 4% + 2% + 2% + 2% + 2% = 18%)

We can repeat the procedure for Row percents and Column percents.

Using “Row Percents”: Tabulated statistics: Cars, Trucks Rows: Cars Columns: Trucks 55 60 65 70 75 All 55 100.00 0.00 0.00 0.00 0.00 100.00 65 10.00 5.00 85.00 0.00 0.00 100.00 70 12.50 6.25 6.25 75.00 0.00 100.00 75 0.00 0.00 7.69 7.69 84.62 100.00 All 10.00 4.00 38.00 26.00 22.00 100.00 Cell Contents: % of Row

Page 30: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

Questions easily answered using this table would be: What percent of the states whose maximum speed for cars is 65 have a maximum speed for trucks of 60? (5.00%)

What percent of the states whose maximum speed for cars is 55 have a higher maximum speed for trucks? (0%) Tabulated statistics: Cars, Trucks Rows: Cars Columns: Trucks 55 60 65 70 75 All 55 20.00 0.00 0.00 0.00 0.00 2.00 65 40.00 50.00 89.47 0.00 0.00 40.00 70 40.00 50.00 5.26 92.31 0.00 32.00 75 0.00 0.00 5.26 7.69 100.00 26.00 All 100.00 100.00 100.00 100.00 100.00 100.00 Cell Contents: % of Column

What type of questions would easily be answered using this table? SCATTER DIAGRAMS

To do a scatter diagram illustrating the relationship between two quantitative variables we will enter the data into two columns. For this illustration the data from Table 3-10 will be used (TA03-10).

Menu Commands Choose: Graph > Scatterplot

Choose: Simple OK Enter: Y variables: C2 X variables: C1 Labels > Title: your title Click: OK

Page 31: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

Sit_Ups

Pus

h_U

ps

55504540353025

60

50

40

30

20

10

Data for Push-ups and Sit-ups

Page 32: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

For the person(s) that did 35 push-ups, how many sit-ups were they able to do?

How many push-ups and sit-ups were done by the person represented by the dot in the upper right corner?

To compare these two variables in a different way, lets do a “side-by-side” box-and-whisker display.

Dat

a

Sit_UpsPush_Ups

60

50

40

30

20

10

Boxplot of Push_Ups, Sit_Ups

Compare the two types of exercises. Which indicates greater range of ability? Which exercise do most of those sampled find more difficult to do (as measured by number done)?

ASSIGNMENT: Do Exercises 3.11, 3.25 in your text

Page 33: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

CHAPTER 3 - LAB SESSION 2 CORRELATION AND REGRESSION

INTRODUCTION: Not only is it important to analyze single variables, but frequently one needs to determine if and how two variables are related. The correlation coefficient is a measure of the strength of the linear relationship between two variables. In these exercises you will use MINITAB to analyze this statistic, and these exercises will also give you a very brief introduction to linear regression. INVESTIGATIONS OF THE CORRELATION COEFFICIENT

The data set below is a sample of weight and waist size for 11 women. You will use that data to estimate the correlation between a woman's weight and her waist size. Once that value has been determined you will show that this value is independent of the scale of the two variables.

Weights and Waist Sizes weight(lbs): 110 143 120 127 143 111 137 154 123 104 140 waist (ins): 22 29 27 26 27 24 28 28 26 25 23

C1='weight' C2='waist'

Get a scatter diagram of the bivariate data set. The variable 'WEIGHT' should be on the x-axis and 'WAIST' on the y-axis. Menu Commands Choose: Graph > Scatterplot Select: Simple Enter: Y variables: C2 X variables: C1 Click: Label:

Title: Weight and Waist Size Click: OK

weight

wai

st

160150140130120110100

29

28

27

26

25

24

23

22

Weight and Waist Size

Page 34: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

Calculate the descriptive statistics of each variable. Descriptive Statistics: weight, waist Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3 weight 11 0 128.36 4.89 16.21 104.00 111.00 127.00 143.00 waist 11 0 25.909 0.667 2.212 22.000 24.000 26.000 28.000 Variable Maximum weight 154.00 waist 29.000

Calculate the correlation coefficient, r. Menu Commands Choose: Stat > Basic Statistics > Correlation Enter: Variables: C1 C2 Click: OK

Correlations: weight, waist Pearson correlation of weight and waist = 0.603

Page 35: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

QUESTIONS: 1. Would you say that the variables were positively or negatively correlated? Is there a

strong or weak correlation? 2. If you were to add an equal amount of weight to each woman (assume no change in waist

size), would the value of r, the correlation coefficient, change? Test your conjecture by adding 25 lbs. to each woman's weight and recalculate r. NOTE:

Assign the results of C1 + 25 to C3 Then calculate the correlation for C3 against C2

3. If you were to change the scale of the variables: weight to kg and waist size to meters, would the value of r change? Test your conjecture by multiplying 'WEIGHT' by 0.453 and 'WAIST' by .0254 and recalculate r. How will the scatter diagram change when you change the scales?

4. The last observation in your data set was for a model known for her especially thin

figure. If you eliminated it from the data set, how much would r change? Would you say that the statistic, r, is sensitive to extreme observations? Explain.

INTERPRETATION OF THE CORRELATION COEFFICIENT

In this next section, we will be examining some scatter diagrams of computer-generated data to gain a more thorough understanding of just what the value of the correlation coefficient means. For each pair of variables, you will calculate r and look at the corresponding scatter diagram.

Enter the values from 0 to 50 for your first variable and name your variable. (Reminder: Menu Commands Choose: Calc > Set Patterned Data)

Name column C1 “X” and name C2 “Random”

Generate a set of random numbers.

Menu Commands Choose: Calc > Random Data

> Normal Enter: Generate 51 rows of data Note: by default the 51 Store in column(s): C2 random numbers are from Click: OK the normal distribution with mean 0 and standard deviation 1

Page 36: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

Get a scatter diagram of the two variables and calculate r.

X

Ran

dom

50403020100

3

2

1

0

-1

-2

Scatterplot of Random vs X

When comparing your output to that presented here, remember you are working with random data and there may be variation in results. Correlations: X, Random Pearson correlation of X and Random = -0.06

Generate a set of y values which has no random component using the expression

2 + 0.5 * C1 and store results in column 3.

Get a scatter plot of C3 versus C1 and determine the value of r.

Page 37: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

X

C3

50403020100

30

25

20

15

10

5

0

Scatterplot of C3 vs X

Correlations: C3, X Pearson correlation of C3 and X = 1.000

Generate a set of y values that have a small random component and repeat above procedure. Use the expression 2 + 0.5 * C1 + C2 storing results in C4

Correlations: C4, X Pearson correlation of C4 and X = 0.989

X

C4

50403020100

30

25

20

15

10

5

0

Scatterplot of C4 vs X

Page 38: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

Generate a set of y values that are negatively correlated, and repeat above procedure. Use the expression

2 - 0.5 * C1 + C2 storing the result in C5 Correlations: C5, X

Pearson correlation of C5 and X = -0.990

Generate a set of y values that have a large random component and repeat previous procedure. Use the expression

5 + 0.5 * C1 + 2 * C2 storing the results in C6

Correlations: C6, X Pearson correlation of C6 and X = 0.958

X

C5

50403020100

0

-5

-10

-15

-20

-25

Scatterplot of C5 vs X

X

C6

50403020100

30

25

20

15

10

5

Scatterplot of C6 vs X

Page 39: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

Generate a set of y values that are non-linearly related to x. Use the expression SQRT(0.1*C1) storing the results in C7

Correlations: C7, X Pearson correlation of C7 and X = 0.974

Generate a second set of y values which are related but not linearly related to x and repeat previous procedure. Use the expression 9 - (C1 - 25)**2 storing the results in C8 Correlations: C8, X Pearson correlation of C8 and X = -0.000

QUESTIONS:

1. Using the results from above, what type of relationship can you determine between the correlation coefficient and the scatter plot? What type of pattern do you see in the scatter diagram when r is close to zero? When r is close to one? What is the pattern like when r is negative?

2. Does r being close to zero imply that the two variables are unrelated? Check C8

versus C1 before answering this question.

X

C7

50403020100

2.5

2.0

1.5

1.0

0.5

0.0

Scatterplot of C7 vs X

X

C8

50403020100

0

-100

-200

-300

-400

-500

-600

-700

Scatterplot of C8 vs X

Page 40: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

LINEAR REGRESSION We will illustrate the default output generated by the Minitab Regression command using exercise 3.75. Retrieve the data from text Exercise 3.75.(EX03-075) Get a feeling for whether years of schooling and median usual weekly earnings are correlated by doing a scatterplot. Determine the linear correlation coefficient for this data.

Page 41: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

Calculate the line of best fit. Menu Commands Choose: Stat > Regression > Regression Enter: Response: C2 Predictors: C3 Click: OK

Page 42: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

Notice that a great deal of information is generated, but we only need the first two lines. We can also do the scatterplot with the regression line included. Just choose “Scatterplot with Regression “.

ASSIGNMENT: Do Exercises 3.20, 3.38, 3.45, 3.59, 3.99 in your text.

Page 43: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

CHAPTER 5 - LAB SESSION RANDOM NUMBERS AND PROBABILITY

INTRODUCTION: This lab session is designed to introduce you to random numbers and their use in simulating experiments. The outcomes of events in normal life cannot be predicted, but it is possible to have an idea of what outcomes are possible. The theory of probability was developed to help analyze experiments whose outcomes are uncertain. We can use MINITAB to simulate certain experiments such as flipping a coin or rolling a die. RANDOM NUMBERS

You were introduced to the RANDOM command in Chapter 3 - Lab Session 2. Remember, if you select the normal distribution it will generate a sequence of random numbers from the normal distribution with a mean of 0 and a standard deviation of 1.0. Different distributions require different parameters. You can specify these parameters depending on the distribution you choose.

You can specify the numbers to be integer within a certain range using the Integer subcommand. For example, if we wanted to simulate the outcomes for tossing a coin 100 times we would use the following commands:

Menu Commands Choose: Calc > Random Data > Integer Enter: Generate 100 rows of data Store in column(s): C1 Minimum value: 1 Maximum value: 2 Click: OK

Take a look at the results in column 1 of the worksheet.

Give the relative frequency for a head (1) and a tail (2) based on the MINITAB output

for C1. Certainly, let the computer do the work:

Page 44: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

Menu Commands Choose: Stat > Tables > Tally Individual Variables Enter: Variables: C1 Select: Percents Click: OK

Page 45: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

Questions: 1. What commands would be used for simulating the rolling of a die 50 times? 2. Restart MINITAB and place 50 simulated rolls into columns C1 and C2. Give the relative frequency for the outcomes 1, 2, 3, 4, 5, and 6 based on the MINITAB output. THE LAW OF LARGE NUMBERS

To see how the law of large numbers works, we need to create a third column with the sums of two dice rolls simulated by columns C1 and C2 (created in the question set above.) Menu Commands Choose: Stat > Tables > Tally Individual Variables Enter: Variables: C3 Select: Counts Percents Cumulative Counts Cumulative Percents Click: OK

The results appear as the following table:

Page 46: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

Tally for Discrete Variables: C3 C3 Count CumCnt Percent CumPct 2 3 3 6.00 6.00 3 2 5 4.00 10.00 4 7 12 14.00 24.00 5 4 16 8.00 32.00 6 6 22 12.00 44.00 7 6 28 12.00 56.00 8 11 39 22.00 78.00 9 5 44 10.00 88.00 10 4 48 8.00 96.00 11 2 50 4.00 100.00 N= 50

Interpreting the results: 1) What is the observed probability of obtaining a sum of 2 on the dice? 2) What is the observed probability of obtaining a sum of 7 on the dice? 3) What is the observed probability of obtaining a sum of 11 on the dice?

Using similar commands, create C4 and C5 containing 500 simulated rolls of a single die and C6 containing the sums of these 500 simulated rolls of 2 dice.

4) Answer the above three questions about C6. How do the answers compare to the theoretical probability? (Use both numerical and graphical evidence.)

THE BINOMIAL PROBABILITY DISTRIBUTION

Consider the following situation: Suppose you bought four light bulbs. The manufacturers claim that 85% of their bulbs will last at least 700 hours. If the manufacturer is right, what are the chances that all four of your bulbs will last at least 700 hours? That three will last 700 hours, but one will fail before that?

Consider another situation. You've somehow gotten enrolled in a class in advanced Greek Mythology. You don’t know anything about mythology but you’re to take a pop quiz. You'll have to guess on every question. It's a multiple-choice test; each of the 20 questions has 3 possible answers. To pass you must get at least 12 correct. What are the chances that you'll pass?

Page 47: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

How would you answer the above questions? MINITAB can help us with this by using the PDF (Probability Density Function) command to generate binomial probabilities. (Remember what a binomial distribution requires.)

Calculating Binomial Probabilities with PDF To obtain the probability of each possible outcome for a binomial distribution with n = 10 and p = 0.1, you will use the following commands. To issue menu commands in this case, you must first create a column with the values for which you wish to find the corresponding probabilities. So before issuing the following menu commands, restart MINITAB then create a column (say, C1) of patterned data containing the integers from 0 to 10.

Menu Commands Choose: Calc > Probability Distributions > Binomial Select: Probability Enter: Number of trials: 10 Probability of success: .1 Input column: C1 Click: OK

Page 48: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

This results in the following table: Probability Density Function Binomial with n = 10 and p = 0.1 x P( X = x ) 0 0.348678 1 0.387420 2 0.193710 3 0.057396 4 0.011160 5 0.001488 6 0.000138 7 0.000009 8 0.000000 9 0.000000 10 0.000000

Looking back to our original questions, to find the probability that three of your four light bulbs will be successes (last more than 700 hours) and one will fail we use:

Calc>Probability Distribution>Binomial Select: Probability Enter: Number of Trials: 4 Probability of success: .85 Select: Input constant Enter: 3 Click: OK

Probability Density Function Binomial with n = 4 and p = 0.85 x P( X = x ) 3 0.368475

Page 49: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

If we wish to determine the probabilities for all values 0, 1, 2, ..., n, enter 0 through n into a column, and choose Input column, rather than input constant. You also have the option of storing the probabilities in another column.

Binomial with n = 4 and p = 0.85 x P( X = x ) 0 0.000506 1 0.011475 2 0.097538 3 0.368475 4 0.522006

Cumulative Distribution Function (CDF)

The CDF command calculates cumulative probabilities. A cumulative probability is the probability that your result will be less than or equal to a particular value. As an example, suppose we calculate the probability you will fail the test in advanced Greek Mythology. Here n = 20 and p = .3333. You will fail the test if you get less than or equal to 11 questions correct. (You will pass if you get 12 or more right.) The following commands can be used to calculate this probability:

Menu Commands Choose: Calc > Probability Distributions

> Binomial Click: Cumulative probability

Enter: Number of trials: 20 Probability of success: .333333

Click: Input constant: 11 Click: OK

Cumulative Distribution Function Binomial with n = 20 and p = 0.333333 x P( X <= x ) 11 0.987027

This is the probability you will fail. So, what is the probability that you will pass? ( 1 - .9870 = .013)

Again, we can place the results of the CDF command in a particular column of the worksheet by entering a column in the Optional storage specification in the dialog box.

Page 50: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

Mean and Standard Deviation of the Binomial Distribution

Using the actual values for n, p and q, the calculator within Calc>Calculator in MINITAB can be used to determine the mean and standard deviation for the binomial distribution. Use the expressions:

n * p store result in “mean” (remember: you are

SQRT(n*p*q) store result in “std dev” substituting the actual values for n, p and q)

We could also use the columns created in the worksheet to do the calculations. Assuming C1 contains the “X” values, and C2 contains the corresponding probabilities

SUM(C1*C2) will produce the mean

SQRT(SUM(C1**2*C2)-(SUM(C1*C2))**2)) ASSIGNMENT: Do Exercises 4.32, 5.36, 5.68, 5.69 in your text

Page 51: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

CHAPTER 6 - LAB SESSION NORMAL APPROXIMATION OF THE BINOMIAL INTRODUCTION: The normal distribution is one of the most important distribution functions in statistics. We will now see how the binomial probabilities can be reasonably estimated by using the normal probability distribution. Later we will need to determine whether normality is a reasonable assumption. A common graphical technique for checking whether a sample comes from a normal population is to create a normal probability plot or N(PP). It is a pairing of values with their corresponding z-scores. If this forms a straight line, the data is normally distributed; otherwise it is not. We will start our investigation with a few specific binomial distributions. Step 1: Entering the data. For this demonstration we will use columns C1, C4, and C7 to hold a series of numbers. The corresponding probabilities will be placed into C2, C5 and C8. Use the following commands to enter the numbers 0, 1, 2, 3, and 4 into column 1: Menu Commands Choose: Calc > Make Patterned Data > Simple Set of Numbers Enter: Store Patterned data in C1 From first value: 0 To last value: 4 In steps of: 1 OK Use similar commands to set C4 to the numbers 0, 1, ..., 8 and to set C7 to the numbers 0, 1, 2, ..., 24. These three columns will be used for three specific situations: n = 4, n = 8, and n = 24. Step 2:Calculating and Storing the Probabilties. We will now place the binomial probabilities for C1 into C2 using the PDF function (Probability Distribution Function) with n = 4 and p = .5. Menu Commands Choose: Calc > Probability Distributions> Binomial Select: Probability radio button Enter: Number of trials: 5 Probability of success: .5 Input column: C1 Optional storage: C2 OK

Page 52: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

Place the binomial probabilities for C4 into C5 and C7 into C8, being sure to use n=8 and n=24, respectively. Step 3:Plotting the Probabilities Now we will plot each of the probabilities of x for 0 to n for n=4 by using the following command:

Menu Commands Choose: Graph > ScatterPlot > Simple Enter: Y variable: C2

Xvariable: C1 Click: Data View tab Check: Project line Click: OK

Page 53: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

C1

C2

43210

0.40

0.35

0.30

0.25

0.20

0.15

0.10

0.05

Scatterplot of C2 vs C1

Repeat this procedure for plotting C5 versus C4 and C8 versus C7.What can we say about the distribution as n becomes larger? Step 4:Interpreting the results. Let's see how the normal approximates a binomial with p = .5 and n = 8. The approximating normal has mu = 8(.5) = 4 and sigma = sqrt((8)(.5)(.5)) = 1.414. First, we need to place the normal probabilities for each x (C4) into another column, say C6

Menu Commands Choose: Calc > Probability Distributions > Normal Select: Probability Density radio button Enter: Mean 4 Standard Deviation 1.414 Input column: C4 Optional storage: C6 OK

Page 54: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

Then to do a multiple plot, use the following commands: Menu Commands Choose: Graph > Scatter Plot > Simple Enter: Y Variables X Variables

C5 C4 C6 C4

Click: Multiple graphs radio button Select: Overlaid on same graph OK

C4

Y-D

ata

9876543210

0.30

0.25

0.20

0.15

0.10

0.05

0.00

VariableC5C6

Scatterplot of C5, C6 vs C4

The ScatterPlot command just executed plotted the pdf for the binomial and for the normal approximation on the same axes. This will help us see why we can approximate a binomial by a normal and how to do the appropriate calculations. The MPLOT plots the first pair of columns with the black dot and the second pair with the red box. When two points overlap, it plots both. In this multiple plot there are many overlaps. You should visualize the histogram corresponding to the binomial probabilities. The height of a bar is the probability the binomial variable is equal to the corresponding value. For example, the height of the bar centered at 5 is the probability that the binomial variable is equal to 5. The base of a bar is 1 unit wide. Therefore, the area of a bar is equal to its height, and is thus equal to the corresponding probability.

Page 55: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

Also visualize the normal curve. (Draw a smooth curve through the values plotted from the normal pdf). Here are some calculations that will help the explanation. Suppose we want the probability the binomial is from 5 to 7. This probability is the sum of the probabilities at 5, 6, and 7. (Look in Rows 6, 7 and 8 in column C4.) The area under the normal curve that goes from 4.5 to 7.5 approximates the area of the three binomial bars. How could we determine this area? Hint: In C11 enter 4.5 and 7.5. Then calculate the CDF for these two numbers and store them in C12 Menu commands Choose: Calc > Probability Distributions > Binomial Select: Cumulative probability Enter: Number of trials: 8 Probability of success: .5 Input columns: C11 Optional storage: C12 OK The probability the binomial variable has a value from 5 to 7 is .359375 . The approximation obtained from the normal probability distribution is .993343 - .638183 = .355160, which is very close to the true probability. If we were to use a normal approximation for a binomial with p = .5 and n = 24 (like we did in columns C5 and C6 for n = 8), the approximation would look even better. In the exercises, we'll look at other values of p.

Page 56: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

EXTENSIONS OF THE THEORY

The Normal Probability Plot The z-scores can be calculated by MINITAB using the following commands and specifying the column in which to place the values. We will calculate the z-scores for C1, place them in C3, and then plot them as follows: Menu Commands Choose: Calc > Standardize

Click: Input Column: C1 Enter: Result in: C3 Click: subtract mean and divide by std dev Click: OK

Choose: Graph > ScatterplotPlot > Simple

Enter: Y Variable: C3 X Variable: C1 Click: OK.

Page 57: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

C1

C3

43210

1.5

1.0

0.5

0.0

-0.5

-1.0

Scatterplot of C3 vs C1

Do the scores fall approximately on a straight line?

Repeat this for n = 8 and n = 24. Lets repeat this for other distributions:

1) Generate 50 random numbers, normally distributed and place them in column 10. Calculate their z-scores and place them in column 11. Then plot C10 against C11.

2) Repeat this for 100 random numbers in C12. 3) Repeat this again for a uniform distribution in C14.

(Replace the subcommand Normal with Uniform 0 1.) Are you getting approximately straight lines?

ASSIGNMENT: 1. (a) Make plots as in the first part of the lab, but use p = .4 instead of p = .5.

Use n = 4, 8 and 24. (b) Repeat part (a) using p = .2. (c) What can you say about the normal approximation to the binomial?

For what values of n and p does it seem to work best?

Page 58: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

2. Suppose X has a binomial distribution with p = .8 and n = 25. Use MINITAB to calculate each of the probabilities below exactly. Also compute the normal approximation to these probabilities. (Remember to use continuity correction.) Compare the binomial results with the normal approximations.

(a) P(X = 21) (b) P(X < 21) (c) P(X > 24) (d) P(21 < X < 24) 3. Do Exercises 6.103 and 6.133 in your text

Page 59: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

CHAPTER 7 - LAB SESSION SAMPLE VARIABILITY

INTRODUCTION: In an effort to predict population parameters, we need to investigate the variability in the sample means obtained from repeated sampling. The Central Limit Theorem tells us that the sampling distribution of sample means, x , is approximately normally distributed. In the following lab you will test the results of the Central Limit Theorem. GENERATING THE DISTRIBUTIONS OF SAMPLE MEANS Uniform Distribution

Enter the values 0 through 9 into column 1 and name column 1 'X': Menu Commands: Choose: Calc > Make Patterned Data> Simple Set of Numbers Store patterned data in: C1 From first value: 0 To last value: 9 In steps of: 1 OK

Enter the probabilities into column 2. For the uniform distribution assign probabilities of .1 to the x-values 0 through 9. Name column 2 'UNIFORM':

Generate 30 sets of 100 uniform deviates (random numbers with a uniform distribution) and store them in C6 through C35. Menu Commands Choose: Calc > Random Data>Discrete Generate 100 rows of data Store in columns: C6-C35 Values in: C1 Probabilities in: C2 OK

Page 60: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

Observe the distribution of the data in C35 by creating a dotplot.

C3586420

Dotplot of C35

To illustrate the concept of a sampling distribution we're considering the finite population {0, 1, 2, ..., 9}. We shall generate values from three very different distributions and investigate, empirically, sampling distributions of the sample means for samples of size n=2, n=5, and n=30 for each of the different distributions. (N=2) Calculate the sample mean, x , for each pair of values given in C6 and C7 and store the results in C41. Menu Commands Choose: Calc > Row Statistics Select Mean radio button Enter: Input variables: C6 C7 Enter: Store result in C41 OK

Page 61: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

Observe the distribution of the sample means in C41 by doing a dotplot as above.

N = 287654321

Means over 2 columns C6 & C7

Notice that this distribution of sample means does not look like the population. (N=5) Calculate x for the values in C8 through C12, storing your results in C42 as we did above.

Page 62: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

Observe the distribution of the sample means in C42:

N=587654321

Means over 5 columns C8 - C12

(N=30) Repeat the above procedure for the values in C6-C35, storing your results in C43. Do a dotplot.

Page 63: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

N=305.254.904.554.203.853.503.15

Means over 30 columns C6-C35

Compare the descriptive statistics and distributions for each of the calculated means Descriptive Statistics: C35, N = 2, N=5, N=30 Total Variable Count Mean TrMean StDev Minimum Q1 Median Q3 C35 100 4.280 4.256 2.995 0.000000000 2.000 4.000 7.000 N = 2 100 4.180 4.172 1.811 0.500 2.625 4.000 5.500 N=5 100 4.562 4.571 1.386 0.800 3.650 4.600 5.400 N=30 100 4.4100 4.4159 0.4880 3.0000 4.1083 4.4000 4.6667 Variable Maximum C35 9.000 N = 2 8.000 N=5 8.200 N=30 5.5000

Page 64: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

Page 65: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

Now do a dotplot to compare them. Note the shape of each of the distributions of the sample means. These distributions don't look like the original data (C35), but they do have a shape we're familiar with.

Data8.47.26.04.83.62.41.20.0

C35

N = 2

N=5

N=30

Dotplot of C35, N = 2, N=5, N=30

Each symbol represents up to 2 observations.

Page 66: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

J-Shaped Distribution

Enter the following probabilities into column 2: .39 .26 .22 .18 .15 .13 .12 .10 .05 .02 and repeat the previous procedure.

U-Shaped Distribution

Enter the following probabilities into column 2: .18 .15 .09 .06 .02 .02 .06 .09 .15 .18 and repeat the previous procedure.

Questions: 1. What are the parameter values for each of the three distributions? 2. What happened to the means and standard deviations of the x 's as n got larger? 3. How did the distributions of x ’s compare to the normal distribution as n got larger? Were the results similar for the different distributions? 4. Do Exercises 7.9, 7.15, 7.40, 7.45 and 7.46 in your text.

Page 67: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

CHAPTER 8 - LAB SESSION ESTIMATION AND HYPOTHESIS TESTING

INTRODUCTION: Two indispensable statistical decision-making tools for a single parameter are (i)confidence intervals, and (ii) hypothesis tests to investigate theories about parameters. In this lab you will learn how to calculate confidence intervals and perform hypothesis tests using MINITAB. CONFIDENCE INTERVALS

Begin a new worksheet and enter 10 random numbers into C1 with a minimum value of 0 and a maximum value of 25. Do this as follows: Menu Commands Choose: Calc > Random Data > Integers Generate 10 rows of data Store in C1 Minimum 0 Maximum 25 OK To see the mean, standard deviation and maximum and minimum values for the data set use the menu selection Stat>Basic Statistics> Display Descriptive Statistics. If you click on the Statistics tab, you can choose whatever statistics you wish to display.

(Your results may be slightly different, since we are using random data.) Find the 99% confidence interval of C1 by typing the following command, and record the results. Note that this command requires the standard deviation (sigma) and the specified level for the confidence interval.

Page 68: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

Menu Commands Choose: Stat > Basic Statistics > 1-Sample z Select: Samples in columns Enter: C1 Enter Standard deviation: 7.5

Select: Options tab Enter: Confidence Level: 99.0

Alternative: not equal Click: OK OK

Find the 95% and 90% confidence intervals of C1 and record the results.

Page 69: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

Looking at these three intervals

1. Consider the means obtained from 100 samples of size 10. If these means were used to construct 100 confidence intervals, determine the expected number of times the population mean would be included in one of these intervals.

2. In the 99% confidence interval that you found, the level of significance is 99%.

What is the value of α ? What does α represent?

3. In which of these intervals is the maximum error, E, the smallest? What does this mean? In which of these intervals are you being more certain to include the population mean?

HYPOTHESIS TESTING To understand the results of a computer driven hypothesis test, it is best to show one first. An example of MINITAB's output for a z-test for data in C1 is given below. The statistics that you need, the test statistic and p-value are the last two values on the bottom line. Menu Commands Choose: Stat > Basic Statistics > 1-Sample z Select: Samples in columns Enter: C1 Enter Standard deviation: 7.5 Test mean 15 (required for test) Select: Options tab Enter: Confidence Level: 99.0 Alternative: less than Click: OK OK The alternative hypothesis is chosen in the drop down menu.

Page 70: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

Another example: A standard final examination in an elementary statistics course is designed to produce a mean score of 75. The hypothesis you will try to verify is: "This particular statistics class is above average." At the .05 level of significance, test the claim that the following sample scores reflect an above-average class:

79 79 78 74 82 89 74 75 78 73 74 84 82 66 84 82 82 71 72 83

Enter the data and get a preliminary graphical analysis. Your menu selections would be Graph > BoxPlot > Simple.

Fina

l exa

m

90

85

80

75

70

65

Final exams

Page 71: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

Descriptive Statistics: Final exam Variable Mean TrMean StDev Minimum Q1 Median Q3 Maximum Final exam 78.05 78.11 5.60 66.00 74.00 78.50 82.00 89.00

Test the hypothesis, "The mean test grade for this class is greater than 75.". Assume sigma = 12.

Menu Commands Choose: Stat > Basic Statistics > 1-Sample z Select: Samples in Columns Enter: C1 Enter: Test mean: 75 Enter: Standard Deviation: 12 Select: Options tab Click: Alternative greater than Click: OK

One-Sample Z: Final exam Test of mu = 75 vs > 75 The assumed standard deviation = 12 99% Lower Variable N Mean StDev SE Mean Bound Z P Final exam 20 78.0500 5.5958 2.6833 71.8078 1.14 0.128

Questions:

1. What are the formal null and alternative hypotheses?

2. What is the value of the test statistic, and what is your decision? Is the mean of this class above “average”?

Page 72: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

ASSIGNMENT: Do Exercises 8.41, and 8.115 in your text, and the following two problems. 1. In one region of a city, a random survey of households includes a question about the number

of people in the household. The results are given in the accompanying frequency table. Construct the 90% confidence interval for the mean size of all such households. Assume that the sample standard deviation can be used as an estimate of the population standard deviation.

Household size 1 2 3 4 5 6 7 frequency 15 20 37 23 14 4 2

2. An aeronautical research team collects data on the stall speeds (in knots) of ultralight

aircraft. The results are summarized in the accompanying stem-and-leaf plot. Construct the 95% confidence interval for the mean stall speed of all such aircraft. Assume σ = 1 .

MTB > Stem-and-Leaf c1.

Stem-and-leaf of C1 N = 16 Leaf Unit = 0.10

21. | 7 8

22. | 3 4 4 6 23. | 2 2 5 8 9 9 24. | 0 1 3 25. | 2

Page 73: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

CHAPTER 9 - LAB SESSION 1 ANALYZING MEAN AND VARIANCE (SIGMA UNKNOWN)

INTRODUCTION: The t-statistic is used when making inferences concerning the population mean when sigma is an unknown quantity. We will introduce the ttest and compare the z and t distributions. THE TINTerval

To generate a confidence interval using the t-statistic we use the One-Sample T command, specifying the level of confidence and the column of data for which the estimation is being made.

Consider the data presented in exercise 9.31 of your text. Enter the data into C1. Before we complete a 95% confidence interval estimate for the mean length of lunch breaks at Giant Mart, we check the normal probability plot and boxplot to verify the normality assumption.

C1

Perc

ent

4035302520

99

95

90

80

70

60504030

20

10

5

1

Mean

0.235

29.32StDev 4.922N 22AD 0.461P-Value

Probability Plot of C1Normal

Page 74: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

C1

40

35

30

25

20

Boxplot of C1

The normality assumptions are satisfied so, to complete the 95% confidence interval estimate for the mean length of lunch breaks at Giant Mart, use the following commands: Menu Commands Choose: Stat>Basic Statistics >1-Sample t Select: Samples in columns: C1 Select: Options Enter: Confidence interval Level: 95 Alternative: Not equal Click: OK OK

Page 75: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

One-Sample T: C1 Variable N Mean StDev SE Mean 95% CI C1 22 29.3182 4.9221 1.0494 (27.1358, 31.5005)

Suppose you were given summarized data as in Ex 9.26. In this problem you are given the sample size n = 41, Σx = 3582.17 and Σ(x – x )2 = 9960.336 . You are asked to give a 90% confidence interval to estimate the true mean cost. We can perform the same test as above, but this time choose “Summarized data” instead of “sample in column”. You will have to calculate the mean and standard deviation by hand first. One-Sample T N Mean StDev SE Mean 95% CI 41 87.3700 15.7800 2.4644 (82.3892, 92.3508)

THE TTEST Using text exercise 9.29 as the basis of our discussion, enter the data values into column C2. Suppose we have been asked to determine whether this accelerator has decreased the drying time by significantly more then 4% at the 0.01 level. The hypotheses to be tested are:

H0: µ = 4.0 Ha: µ > 4.0

To perform the test, use the following commands: Menu Commands Choose: Stat>Basic Statistics > 1-Sample t Select: Samples in columns: Enter: C2 Select: Test mean : 4.0 Select Options Enter: Confidence level: 95 Select: Alternative: greater than Click: OK

Page 76: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

One-Sample T: DryTime Test of mu = 4 vs > 4 95% Lower Variable N Mean StDev SE Mean Bound T P DryTime 8 4.56250 1.34051 0.47394 3.66458 1.19 0.137

Is there sufficient evidence to show that this accelerator has decreased the drying time significantly more than 4% at the .01 level?

As another example consider the point spread between opposing teams in the 1996 bowl games : 5 20 19 33 6 10 7 18 29 41 6 32 9 36 Enter the data into column 3. Test the hypothesis, "The average spread between the scores of the winning and the losing teams in a college bowl game is less than 20." Assume sigma is unknown. Do the same as above, making the appropriate choices.

One-Sample T: Pt Spread Test of mu = 20 vs < 20 95% Upper Variable N Mean StDev SE Mean Bound T P Pt Spread 14 19.3571 12.7013 3.3946 25.3687 -0.19 0.426

Questions:

1 What are the formal null and alternative hypotheses?

2. What is the value of the test statistic, and what is your decision if α = .10? Is the final point spread of a bowl game less than 20?

3. What does the size of the p-value tell us?

ASSIGNMENT: Do Exercises 9.56, 9.60 in your text

Page 77: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

COMPARISON OF THE Z AND T DISTRIBUTION

Why do you use two different distributions depending on the availability of the standard deviation, σ ? What basic assumptions are necessary to use the t-statistic? Is the basic assumption that the parent population is normally distributed a necessary one? Why? If the parent population is not known to be normally distributed, when can we use the t-statistic? In this exercise you will generate both types of statistics from the same 100 samples and be able to compare the two empirical distributions.

Open a new worksheet, then generate 100 samples of size 5 from a normal distribution with mu=15 and sigma=10, and store the mean and standard deviation of each of the 100 samples. Menu Commands Choose: Calc>Random Data>Normal Generate 100 rows of data Store in columns: C1-C5 Mean: 15 Standard Deviation: 10 OK

Calculate both z and t statistics. Recall: ,x xz t sn n

µ µσ− −

= =

Menu commands Choose: Calc > Row Statistics Select: Mean radio button Input variables: C1-C5 Store results in: C6 OK Choose: Calc > Row Statistics Select: Standard deviation radio button Input variables: C1-C5 Store results in: C7 OK Choose: Calc>Calculator Store in: C8 Expression: (C6-15)/(10/SQRT(5)) OK Choose: Calc>Calculator Store in: C9 Expression: (C6-15)/(C7/SQRT(5)) OK

Page 78: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

For each of the two statistics, z and t, count the number of times their value is more than 2 units away from the mean of 0. To do this, sort columns C8 and C9 and observe the data. Compare the two distributions graphically by using histograms (multiple graphs, overlay).

z

Freq

uenc

y

543210-1-2-3-4-5

25

20

15

10

5

0

Histogram of z

t

Freq

uenc

y

5.04.03.01.50.0-1.5

-3.0

-4.5

-5.0

20

15

10

5

0

Histogram of t

Page 79: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

QUESTIONS: 1. How many of the calculated z-statistics were more than two units away from the

origin? How many of the t-statistics?

2. What did the distributions for the two statistics look like? Compare their centers, spread, and overall shape.

3. Would you describe the t-distribution as bell-shaped? If so, would you say it is

approximately normal? (i.e., is the z-score plot of the t-statistic a straight line?)

4. If you were to increase n, would you expect the difference between the two distributions to increase or decrease?

ASSIGNMENT: Do Exercise 9.64 in your text

Page 80: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

CHAPTER 9 - LAB SESSION 2 ANALYZING THE POPULATION PROPORTION

INTRODUCTION: In this lab we will investigate the inferences that can be made about the binomial parameter p. Inferences concerning the population binomial parameter p are made using procedures that closely parallel the inference procedures for the population mean µ (see Chapter 9 Lab Session 1). CONFIDENCE INTERVALS

Consider the following sample problem: A telephone survey was conducted to estimate the proportion of households with a personal computer. Of the 350 households surveyed, 75 had a personal computer. First, we will determine a point estimate for the proportion in the population who have a personal computer. The data to be entered will be a series of 0's and 1's, each number designating one of two categories.

Since the parameter of concern is the proportion of households with a personal computer, we use 1 to represent 'has a personal computer' and use 0 to represent 'does not have a personal computer'. The easiest way to enter the data is to enable the command editor and type in the following commands in the Session window.

MTB> SET C1 DATA> 75(1) 275(0) DATA> END

This can also be accomplished in the worksheet by entering a 1 in the first row of C1, then click and hold the “+” in the lower right hand corner of the cell and

dragging through row 75. Then enter a 0 in row 76, and click, hold and drag to row 350.

Calculate the mean and store it in C2. This actually represents p’.

Page 81: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

Now, if we wish to generate a confidence interval for p (let’s say 95%), do the following: Menu Commands Choose: Stat > Basic Statistics > 1 Proportion Select: Sample in column radio button and enter C1 Select: Options Enter: Confidence Interval: 95 We can ignore Test proportion, and Alternative since we’re not really interested in the hypothesis test Check box: Use test and interval based on normal distribution. OK OK

Page 82: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

Test and CI for One Proportion: C1 Test of p = 0.5 vs p not = 0.5 Event = 1 Variable X N Sample p 95% CI Z-Value P-Value C1 75 350 0.214286 (0.171298, 0.257273) -10.69 0.000

Page 83: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

HYPOTHESIS TESTING This sample problem will take you through the steps of entering the summarized data and performing a hypothesis test for exercise 9.105 in your textbook. This time we will not “simulate” the data, rather we will use the summarized statistics given in the problem. The hypotheses for this test are H0: p = .9 vs Ha: p < .9 Menu Commands Choose: Stat > Basic Statistics > 1 Proportion Select: Summarized Data Enter: Number of trials: 75 Number of successes: 55 Select: Options Enter: Test Proportion: .9 Select: Alternative: less than OK OK Test and CI for One Proportion Test of p = 0.9 vs p < 0.9 95% Upper Sample X N Sample p Bound Z-Value P-Value 1 55 75 0.733333 0.817324 -4.81 0.000

1. What decision should be made based on these results? 2. What does P VALUE = 0.0000 tell us? Assignment: Do exercises 9.107, and 9.109 in your text.

Page 84: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

CHAPTER 9 - LAB SESSION 3 ANALYZING THE POPULATION VARIANCE

INTRODUCTION: In this lab we will present the hypothesis test for the standard deviation for a normal population. When sample data are skewed, just one outlier can greatly affect the standard deviation. It is very important, especially when using small samples, that the sampled population be normal; otherwise the procedures are not reliable. However, unlike the analysis for the mean you will not have convenient computer commands to help you. To use Example 9.19 as an example of using Minitab to aid in completion of the hypothesis test, let's assume the 12 samples tested yielded the following data: 165 172 180 189 181 174

165 185 211 170 198 171 Enter the data into C1, determine the descriptive statistics and do either a dot plot or histogram. Descriptive Statistics: C1 Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3 C1 12 0 180.08 4.01 13.89 165.00 170.25 177.00 188.00 Variable Maximum C1 211.00

C1210203196189182175168

Dotplot of C1

Page 85: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

Recall that the manufacturer claims “shelf life” is normally distributed. Why is this important?

The necessary expressions for completing the hypothesis test follow, using the calculator:

Store the hypothesized variance in C2 102 or 100 Store the degrees of freedom in C3 COUNT(C1) - 1 (or 11)

Store the standard deviation in C4 STDEV(C1) Store χ 2 * in C5 (C3*(C4*C4))/C2 The spreadsheet will look like this: hyp var df s ChiSq 165 100 11 13.8922 21.2292 172 180 189 181 174 165 185 211 170 198 171 Compute the area under the curve to the left of chi-square* by the following: Menu commands: Choose: Calc > Probability Distributions > Chi-Square Select: Cumulative Probability Noncentrality Parameter: 0.0 Enter: Degrees of freedom: 11 Select: Input constant* Enter: Χ2 value (in this case 21.2296) OK (Remember, our value of chi-square is different than the example, since we made up data for the problem. See questions below about doing the test using summary statistics.)

Page 86: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

Remember this is the left tail. The right tail would be 1-0.968927 = 0.031073. Questions:

1. What is the p-value?

2. What decision should be made?

3. Does your conclusion match that for Example 9.19? The p-value doesn’t match exactly because we made up data to fit the statistics and the data values don’t produce the given mean and standard deviation exactly. It is not necessary to have the raw data. Can you redo the test using only the summary statistics?

ASSIGNMENT: Do Exercises 9.137 and 9.144 in your text.

Page 87: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

CHAPTER 10 LAB SESSION INFERENCES INVOLVING TWO POPULATIONS

INTRODUCTION: When comparing two populations we need two samples, one from each population. Two kinds of samples can be used: dependent or independent, determined by the source of the data. The methods of comparison are quite different. CASE 1. DEPENDENT SAMPLE (PAIRED DATA): The two data values, one from each set, that come from the same source are called paired data. They are compared by using the difference in their values, called the paired difference, d. Because the distribution of the paired difference, d = x1 - x2, will be approximately normally distributed when paired observations are randomly selected from normal populations, we will use the t-test. We wish to make inferences about µd where the random variable (d) involved has an approximately normal distribution with an unknown standard deviation (σd). Confidence Interval Consider the data presented in exercise 10.16 of your text. Use MINITAB to generate the 95% confidence interval for the mean improvement in memory resulting from taking the memory course. ( d = after - before) Retrieve the data file for ex10-016 from the Student Suite CD. Using the calculator, form the paired difference using the expression C2 - C1 and store it in C3.

To generate the interval Menu Commands Choose: Stat > Basic Statistics > 1-Sample T Enter: Samples in Columns: C3 Click OK.

Page 88: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

The response is shown in the session window. One-Sample T: C3 Variable N Mean StDev SE Mean 95% CI C3 10 6.10000 4.79467 1.51621 (2.67010, 9.52990)

Hypothesis Testing To demonstrate the procedure for a hypothesis test on mean difference we will do Exercise 10.38. Enter the data for Before in column C1 and for After in column C2 by retrieving it from the Student Suite CD (ex10-038). Using the calculator, subtract the values in C1 from the values in C2 and place the paired differences in C3. Perform a t-test on the paired differences in C3. Menu Commands Choose: Stat>Basic Statistics > 1-sample t Enter: Variables: C3 Select : Test mean 0.0 Click: OK Results for: EX10-038.MTW One-Sample T: C3 Test of mu = 0 vs not = 0 Variable N Mean StDev SE Mean 95% CI T P C3 10 7.00000 5.79272 1.83182 (2.85614, 11.14386) 3.82 0.004

How would you interpret these results?

ASSIGNMENT: Do Exercises 10.19, 10.20, 10.22, 10.34 in your text.

Page 89: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

CASE 2. INDEPENDENT SAMPLES: If two samples are selected, one from each of the populations, the two samples are independent if the selection of objects from one population is unrelated to the selection of objects from the other population. Since the samples provide the information for determining the standard error, the t distribution will be used as the test statistic, and the degrees of freedom will be calculated by MINITAB. The TWOSAMPLE command performs both the confidence interval and the hypothesis test at the same time. a) Consider Exercise 10.49 in your text. Retrieve the data from the Student Suite CD or enter the data for the males is in C1 and the females is in C2. Then, to complete the 99% confidence interval: Menu commands Choose: Stat > Basic Statistics > 2-Sample t Select: Samples in different columns Enter: First: C1 Enter: Second: C2 Click: Options Enter: Confidence level: 99 Click: OK OK

Page 90: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

The response appears in the session window.

Verify these results by doing the calculations. b) Complete the hypothesis test presented in Exercise 10.72 of your text. Retrieve the data from the Student Suite CD and note that the data for Diet A is in C1 and Diet B is in C2. Menu Commands Choose: Stat>Basic Statistics > 2-Sample t Select: Samples in different columns Enter: First: C1 Enter : Second: C2 Select: alternative: less than Click: OK

Page 91: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

The results appear in the session window.

Do the data justify the conclusion that the mean weight gained on diet B was greater than the mean weight gained on diet A, at the α = .05 level of significance? ASSIGNMENT: Do Exercises 10.76, and 10.79 in your text. Both sets of data are found on the Student Suite CD. Enrichment Assignment: Do Exercise 10.80 or 10.81. Turn in a typed paper detailing your procedures and results. Include the session commands you used and a printed copy of your output to substantiate your conclusions. Review Chapter 1 Lab Session on how to record your session commands and printing out results. Remember you can output your results to a file and then import that file into your word processor to make report writing easier. COMPARING TWO PROPORTIONS USING TWO INDEPENDENT SAMPLES Confidence Interval Consider exercise 10.85. We are interested in estimating the difference in the proportion of male and female teenagers who have ever gambled. The sample evidence given is that 66% of the 200 males (x = 132) and 37% of the 199 females (x = 74) have “ever gambled”. Menu Commands Choose: Stat > Basic Statistics > 2-Proportions Select: Summarized Data Enter: First: 200 (trials) 132 (events) Second: 199 (trials) 74 (events) Select: Options Enter: Confidence level : 95% OK OK

Page 92: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

The response is shown in the session window. Test and CI for Two Proportions Sample X N Sample p 1 132 200 0.660000 2 74 199 0.371859 Difference = p (1) - p (2) Estimate for difference: 0.288141 95% CI for difference: (0.194231, 0.382051) Test for difference = 0 (vs not = 0): Z = 6.01 P-Value = 0.000

Hypothesis Test Consider exercise 10.101. Menu Commands Choose: Stat > Basic Statistics > 2-Proportions Select: Summarized Data Enter: First: 200 (trials) 132 (events) Second: 199 (trials) 74 (events) Select: Options Enter: Confidence level : 95% Alternative: not equal Select: Use pooled estimate of p for test OK OK

Page 93: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

The response is shown in the session window.

What conclusion should be reached? ASSIGNMENT: Do Exercises 10.90, and 10.100 in your text.

Page 94: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

CHAPTER 11 LAB SESSION ANALYZING ENUMERATIVE DATA

INTRODUCTION: The data used in this lab is enumerative -- that is, the data is placed in categories and counted. The observed frequencies list exactly what happened in the sample. The expected frequencies represent the theoretical expected outcomes (what is expected to happen “on the average”). These expected values must always add up to n. When we perform a hypothesis test on these two sets of values we are really asking “how different are they”? If the difference is small, we may attribute it to the chance variation in the samples. However if the difference is large there may be a difference in the proportions in the population and we may reject the null hypothesis. We can use the χ2 distribution in our test. We will first make inferences concerning multinomial experiments and then extend that to contingency tables. MULTINOMIAL EXPERIMENTS A multinomial experiment consists of n independent trials, whose outcome fits into only one of k possible cells. The probabilities of each of these cells remains constant and the sum of all the probabilities = 1. For multinomial experiments we will always use a right tail critical region of the distribution. The expected frequency for each cell is obtained by multiplying the probability for that cell by the total number of trials, n. We can program MINITAB to calculate the Chi-Square statistic by entering the data, and the probability for each cell, calculating the expected values for each cell, and the Chi-Square value for each cell. We then need to sum each of these columns. Let’s do Example 11-1 from the text, implementing MINITAB to do the calculations. Since there are seven sections, we can assume the probability of choosing any one of them would be 1/7. So we will enter 0.142857 in seven rows of column one. Enter the seven observed values from Table 11.3 on page 550 into column 2. Calculate the expected values and place in C3 using the expression C1 * SUM(C2)

Calculate Chi-Square and place in C4 using the expression (C2-C3)**2/C3

Page 95: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

Next calculate the sums of each column and place in columns C5 – C8. You should get the following output. Sum C1 Sum C2 Sum C3 Sum C4 0.142857 18 17.0000 0.05883 1.00000 119 119.000 12.9412 0.142857 12 17.0000 1.47058 0.142857 25 17.0000 3.76473 0.142857 23 17.0000 2.11766 0.142857 8 17.0000 4.76469 0.142857 19 17.0000 0.23530 0.142857 14 17.0000 0.52941

Compare the MINITAB results to the results given in the example.

ASSIGNMENT: Do Exercises 11.15, 11.21, and 11.22 in your text. CONTINGENCY TABLES AND THE CHI SQUARE COMMAND

This command performs a test of H0 that there is no relationship between the row and column variables in a table. We will enter the integer information directly into the worksheet. The MINITAB output will show the observed values and the expected values in each cell, the calculated value of the χ2 and the degrees of freedom. Note that the observed values are entered into the columns. The expected values are calculated for us by MINITAB. (Remember, from your text, that these numbers are calculated by multiplying the appropriate row and column sums and dividing by the total number of trials, n. The formula for the cell in the ith row and jth column is (Ri x Cj) / n). Enter the data from Example 11- 6 to see how it works.

Menu Commands Choose: Stat>Tables>Chisquare Test (Table in Worksheet) Enter: Columns containing the table: C1 C2 Click: OK

Page 96: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

The response is shown in the session window. Chi-Square Test: Favor, Oppose Expected counts are printed below observed counts Chi-Square contributions are printed below expected counts Favor Oppose Total 1 143 57 200 101.60 98.40 16.870 17.418 2 98 102 200 101.60 98.40 0.128 0.132 3 13 87 100 50.80 49.20 28.127 29.041 Total 254 246 500 Chi-Sq = 91.715, DF = 2, P-Value = 0.000

Page 97: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

Let’s perform the procedure using the data from Exercise 11.45. First name your columns with the days of the week. Enter the data in the appropriate columns.

From the STAT menu, choose TABLES and then CHISQUARE. Select the columns C1 - C5

Perform the test to obtain the following output in the session window. Chi-Square Test: Mon, Tues, Wed, Thurs, Friday Expected counts are printed below observed counts Chi-Square contributions are printed below expected counts Mon Tues Wed Thurs Friday Total 1 85 90 95 95 90 455 91.00 91.00 91.00 91.00 91.00 0.396 0.011 0.176 0.176 0.011 2 15 10 5 5 10 45 9.00 9.00 9.00 9.00 9.00 4.000 0.111 1.778 1.778 0.111 Total 100 100 100 100 100 500 Chi-Sq = 8.547, DF = 4, P-Value = 0.073

You still have to frame the null and alternative hypothesis, set the criteria, and then, using the results from the above computer printout, draw your conclusion. ASSIGNMENT: Do the following Exercises 11.50, 11.51, , 11.68, 11.74 in your text

Page 98: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

CHAPTER 12 LAB SESSION ANALYSIS OF VARIANCE

INTRODUCTION: In earlier sessions you have examined and compared means from two samples. We will now practice a technique that tests hypothesis about several means. While we could compare the means in pairs as we have done before, the process could become too unwieldy to be of any use. Analysis of variance (ANOVA) allows us to test all the means at the same time to see if there is any significant difference between them. The Logic Underlying The Anova Technique We will be forming a comparison between two estimates of the population variance: one based on the variance within each set of data and the other between the sets of data. We will use the F distribution for this comparison. If there is relatively little difference within each group and a large difference between the sample means, we will reject the null hypothesis. (Remember we always word the null hypothesis to say “there is no difference...”). If there is a lot of variance within a group and little between groups, we cannot conclude that the population means are different. We also need to know that the groups under investigation are approximately normally distributed and independent. ANOVA is presented as a table, and we need to define our terms in order to understand what the table is telling us. The Factor is the variable whose means we are interested in studying. When we first set up our data charts in MINITAB, each column will represent different Levels of the Factor we are examining. Each row will be a data value from repeated samplings, called a Replicate. The ANOVA table will represent the Factor as the first row of the table. The next row is the Error, followed by the Total. The columns will be Degrees of Freedom (DF), the sum of squares (SS), and the mean square (MS) which is the ratio of the sum of the squares to the degrees of freedom for the factor and the error. PERFORMING AN ANOVA ANALYSIS This sample problem will take you through the steps of entering the data and generating the ANOVA table for Example 12-1 in your textbook. The FACTOR we are looking at is temperature and whether it has any effect on production. We will examine production at three different temperature levels: 68°, 72°, 76°. These levels form our columns. The production amounts are the replicates and form the rows of the data table. You can name the columns and enter the data directly into the worksheet. Make sure you have entered the data correctly.

Page 99: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

If we did a DOTPLOT of the three columns some interesting things are shown.

Data1210864

TEMP 68

Temp 72

Temp 76

Sample Results on Temperature and Production

Note that the points within each level are fairly close, but the three levels hardly overlap at all. The command for generating the ANOVA table is as follows: Menu Commands Choose: Stat>ANOVA> Oneway (Unstacked) Select C1-C3 Enter: Responses in separate columns C1 C2 C3 Click: OK

Page 100: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

The response is shown in the session window. One-way ANOVA: TEMP 68, Temp 72, Temp 76 Source DF SS MS F P Factor 2 84.500 42.250 44.47 0.000 Error 10 9.500 0.950 Total 12 94.000 S = 0.9747 R-Sq = 89.89% R-Sq(adj) = 87.87% Individual 95% CIs For Mean Based on Pooled StDev Level N Mean StDev ---------+---------+---------+---------+ TEMP 68 4 10.250 1.258 (---*---) Temp 72 5 7.000 0.707 (---*---) Temp 76 4 3.750 0.957 (---*---) ---------+---------+---------+---------+ 5.0 7.5 10.0 12.5 Pooled StDev = 0.975

Compare the output to the calculations in Example 12-1 in the text. Note in particular that the calculated value for F* = 44.47. To make our decision, we need to compare this to the critical value F(2,10,.05) = 4.10. We can therefore conclude that at least one of the temperatures has an effect on the production level. The p-value given in the chart can also be used to determine the conclusion. How would you interpret it? Exercise 12.53 in the chapter exercises compares the stopping distances for four brands of tires. Using the data given there, is there sufficient evidence to conclude that there is a difference in the mean stopping distances at the α = .05 level? This data may be found on the Student Suite CD as Ex12-53

Page 101: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

Procedure:

a) State your null and alternative hypotheses. b) Find your critical region and value for F. c) 1) Enter your data in columns 1 - 4, naming them A, B, C, D respectively. 2) Do a dotplot to get a feel for how the data interact. 3) Perform an ANOVA to calculate F*. What does the p value tell you?

Explain. d) Draw your conclusion about the null hypothesis and explain what it means to

you. How would your conclusion change if α changed? ASSIGNMENT: Do Exercises 12.28, 12.29, 12.51 and 12.55 in your text.

Page 102: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

CHAPTER 13 LAB SESSION LINEAR REGRESSION ANALYSIS

INTRODUCTION: In an earlier lab, we looked at bivariate data, and used the linear correlation coefficient to see if there was a relationship between the two variables. You also looked at a method of developing a line of best fit. In this lab we will look at a method of deciding whether the equation of that line is of any use to us in making point predictions and developing confidence intervals. Before beginning this lab, you should review the commands for performing a regression analysis in Chapter 3 Lab Session Lab 2. Use the data in Exercise 13.43 just to refresh your memory. Enter x values in C1 Enter y values in C2 Menu commands Choose: Stat>Regression >Regression Enter: Response: C2 Predictors: C1 Click: OK Check your results in the session window. Regression Analysis: y versus x The regression equation is y = - 13.4 + 2.30 x Predictor Coef SE Coef T P Constant -13.414 7.168 -1.87 0.098 x 2.3028 0.1918 12.01 0.000 S = 10.1738 R-Sq = 94.7% R-Sq(adj) = 94.1% Analysis of Variance Source DF SS MS F P Regression 1 14924 14924 144.19 0.000 Residual Error 8 828 104 Total 9 15752

Page 103: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

There are several steps in doing a linear regression analysis. First we obtain a least squares estimate for the model equation y = β0 +β1 x +ε, Next, we need to check our assumptions about the random error component, ε. (The mean value of the experimental error is zero. We must also assume that the distribution of the y’s is approximately normal and the variances σ2 of the distribution of random errors is a constant.) Note that an estimate of σ2 can be obtained from the MINITAB printout (s2). Third, assess the usefulness of the model by making inferences about the slope. Lastly, we can construct a confidence interval for our predictions. We will use Exercise 13.88 to demonstrate the procedure. 1. Do a scatterplot to visually check if there is a linear relationship. Menu commands Choose: Graph> ScatterplotPlot Select: Simple Enter: OK Enter Y-variables : C2 Enter X-variables: C3

Page 104: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

2. Determine the correlation coefficient. Menu commands Choose: Stat>Basic Statistics > Correlation Enter: Variables: C3 C2 Enter: OK The answer will appear in the session window. Correlations: Population, Settlement Pearson correlation of Population and Settlement = 0.928

3. Find the Equation of the Line of Best Fit: Menu commands Choose: Stat>Regression > Regression Enter: Response: C2 Predictors: C3 Click: OK The response appears in the session window. Regression Analysis: Settlement versus Population The regression equation is Settlement = 0.047 + 0.879 Population Predictor Coef SE Coef T P Constant 0.0466 0.3724 0.13 0.901 Population 0.87936 0.05191 16.94 0.000 S = 1.93233 R-Sq = 86.2% R-Sq(adj) = 85.9% Analysis of Variance Source DF SS MS F P Regression 1 1071.4 1071.4 286.95 0.000 Residual Error 46 171.8 3.7 Total 47 1243.2

Page 105: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

Unusual Observations Obs Population Settlement Fit SE Fit Residual St Resid 5 31.9 25.000 28.081 1.436 -3.081 -2.38RX 8 0.7 7.750 0.680 0.349 7.070 3.72R 30 18.2 25.000 16.033 0.751 8.967 5.04RX R denotes an observation with a large standardized residual. X denotes an observation whose X value gives it large influence. Note the different values: Find b0, b1, sb , the calculated value of t*, the p-value. Perform the hypothesis test using the information from the ANOVA results. 4. Form the confidence intervals: Menu commands Choose: Stat > Regression> Fitted Line Plot Enter: Response(Y): C2 Predictor(X): C3 Select: Type of Regression Model: Linear Click: OK

Page 106: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

Now let’s go back and select some options:

Menu commands Choose: Stat > Regression > Fitted Line Plot Enter: Response(Y): C2

Predictor(X): C3 Select: Type of Regression Model: Linear

Click: Options tab Check: Display confidence interval Display prediction interval Confidence level: 95.0 OK

We now get this graph. Answer the questions contained in 13.82 using this information. ASSIGNMENT: Do Exercises 13.79, 13.87, 13.90 in your text.

Page 107: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

CHAPTER 14 LAB SESSION ELEMENTS OF NON-PARAMETRIC STATISTICS

INTRODUCTION: All the previous methods we have studied are parametric statistics - based on a population that has a certain distribution and can be applied only when special criteria are met. Non-parametric statistical methods can be applied when these criteria are not able to be met and assumptions about the parent population (such as normality) cannot be made, since these techniques do not rely on the distribution of the parent population. Non-parametric methods tend, unfortunately, to waste information and are less sensitive than their parametric counterparts. This, however, can be compensated for very nicely by increasing the sample size. Non-parametric techniques are generally easier to apply and are only slightly less efficient than parametric techniques. THE SIGN TEST The Sign test is one of the easiest tests to use, since it reduces the data to plus and minus signs. It can be used in hypothesis test for a single median or for two dependent samples using a paired difference. The basic concept is that because the median is the middle piece of data, with 50% of the data above it (represented by +) and 50% below (represented by - ), then P(+) = .5 and P(-) = .5 . The method is fairly simple: all zeroes are rejected and the rest of the data is assigned positive and negative signs. The test statistic is the number of the less frequent sign. This is actually a binomial random variable (outcome either + or -) with a probability of 1/2. Z is calculated by the formula

z = (x′- n/2)/ [(1/2) √n] We will use the data from Exercise 14.3 as a sample for using MINITAB to perform a sign test. a) State the hypotheses: H0: The median high temperature = 48 Ha: The median high temperature ≠ 48. b) Set test criteria: First enter a new worksheet, placing the data in column C1. Using the menu and choosing nonparametric single sample sign test and entering the test median to be 48 we get the following: α = 0.05

Page 108: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

Menu Commands Choose: Stat > Nonparametrics > 1-Sample sign Enter: Variables: C1 Select: Test median 48 Click: OK c) The results appear in the session window.

Note that the p-value is less than α. Notice we have only 3 temperatures above the stated median and 16 below. The actual median of the sample is 45.5. d) We therefore reject the H0 in favor of the Ha. We can get the Confidence Interval by using the same commands and checking the Confidence Interval box and entering the desired level of confidence. The response will be displayed in the session window.

ASSIGNMENT: Do Exercise 14.14 in your text.

The Sign test can also be used for paired differences with two dependent samples. Do Exercise 14.15 in your text.

Page 109: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

THE MANN WHITNEY TEST This is an alternative method for the t-test on two independent random samples in which the random variable is continuous (also called Mann-Whitney-Wilcoxon test). By default, a two-sided test is performed. To do one-sided tests, select the test you want from the Alternative dialogue box. The test is carried out as follows: First, the two samples are ranked together, with the smallest observation given rank 1, the next largest given rank 2, and so on. Then the sum of the ranks of the first sample is calculated. If the sum is small, it indicates the observations from the first sample are smaller than those from the second sample, etc.The attained significance level of the test is calculated using a normal approximation (with a continuity correction factor). The following problem demonstrates Example 14-6 in your text: We first name and enter data in C1 the grades from exam A and repeat this in C2 for exam B. Menu Commands

Choose: Stat > Nonparametric > Mann-Whitney Enter: First sample:C1

Second sample: C2 Confidence level: 95.0 Alternative “not equal”

Click: OK

Page 110: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

The response will appear in the session window. Note that the p-value is not smaller than α, so we fail to reject H0.

Example 14-7 is completed in a similar manner. Enter the data found on page 680. This time choose “less than” for the alternative. Complete the test. ASSIGNMENT: Do Exercises 14.29, 14.30, and 14.33 in your text. Be sure to clearly state the hypotheses and test criteria. Each data set is on the Student Suite CD.

Page 111: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

RUNS TEST FOR RANDOMNESS How do we really know when a set of outcomes is truly random? It cannot be in just counting the number of outcomes, but also in looking at the order in which those outcomes arise -- their arrangement. A particular run is a sequence of outcomes that have a common property. When that property changes the current run ends and a new one begins with the new property. The random variable to be considered is V, the number of runs. Its critical value is found in Table 14. Example 14-10 is used to demonstrate the MINITAB technique:

a) State your hypotheses: H0: The numbers are random Ha: the numbers are not random

b) State criteria: A two tail test with α = .05 and critical values 2 and 10 from Table 14.

c) Perform the test: First enter the data and then choose Runs test from the menu. Since there are 30 values, the median will be in the 15.5 position, or 3.5.

Page 112: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

Menu Commands Choose: Stat>Nonparametrics > Run’s Test Enter: Variables: C1 Select: above and below: Enter: 3.5

The solution will appear in the session window.

What would your conclusion be? ASSIGNMENT: Do Exercises 14.31, 14.41, 14.44 and 14.47 in your text.

Page 113: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

RANK CORRELATION This test is a nonparametric alternative to the linear correlation coefficient. The test is used to determine if there is a correlation between two rankings. Let’s consider exercise 14.60. We then determine the rankings for each list with the following commands or menu choices. Menu Commands Choose: Data > Rank Enter: Rank data in: C2 Store ranks in: C4 Click: OK Repeat the above commands for data in C3 and store in C5.

Page 114: LAB SESSION 1 - Cengage Learning

Technology Guide for Elementary Statistics 11e: Minitab

Then, to determine the Spearman rank correlation coefficient for the two rankings: Choose: Stat > Basic Statistics > Correlation Enter: Variables: C4 C5 Click: OK

Correlations: C4, C5 Pearson correlation of C4 and C5 = -0.291 P-Value = 0.385

What would your conclusion be?

ASSIGNMENT: Do exercises 14.61 and 14.63 in your text.