chapter 1 sampling and data. what is (are?) statistics? statistics (a discipline) is a science of...

34
Chapter 1 Sampling and Data

Post on 15-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 1 Sampling and Data. What Is (Are?) Statistics? Statistics (a discipline) is a science of dealing with data. It consists of tools and methods

Chapter 1

Sampling and Data

Page 2: Chapter 1 Sampling and Data. What Is (Are?) Statistics? Statistics (a discipline) is a science of dealing with data. It consists of tools and methods

What Is (Are?) Statistics?

Statistics (a discipline) is a science of dealing with data. It consists of tools and methods to collect data, organize data, and interpret the information or draw conclusion from data.

Note: Statistics (plural) sometimes are referred to particular calculations made from data. For

instance, mean, median, percentage etc. are statistics, since these are numbers calculated from a set of sample data collected.

Page 3: Chapter 1 Sampling and Data. What Is (Are?) Statistics? Statistics (a discipline) is a science of dealing with data. It consists of tools and methods

Basic Terms

• Population: A collection, or set, of individuals or objects or events whose properties are to be analyzed.

• Sample: A subset of the population.

• Parameter: A numerical value summarizing all the data of an entire population, for instance, a population mean.

• Statistic: A numerical value summarizing the sample data, for instance, a sample mean.

Page 4: Chapter 1 Sampling and Data. What Is (Are?) Statistics? Statistics (a discipline) is a science of dealing with data. It consists of tools and methods

Two Areas of Statistics

Two areas of statistics:• Descriptive Statistics: collection, presentation,

and description of sample data.

• Inferential Statistics: making decisions and drawing conclusions about populations.

Page 5: Chapter 1 Sampling and Data. What Is (Are?) Statistics? Statistics (a discipline) is a science of dealing with data. It consists of tools and methods

What is a Variable?

• Variables are characteristics recorded about each individual or thing.

• The variables should have a name that identify What has been measured.

Page 6: Chapter 1 Sampling and Data. What Is (Are?) Statistics? Statistics (a discipline) is a science of dealing with data. It consists of tools and methods

What is an Observational Unit?

The person or thing to which the variable is observed or measured, such as a student in the class, is called the observational/experimental unit or simply a case .

Page 7: Chapter 1 Sampling and Data. What Is (Are?) Statistics? Statistics (a discipline) is a science of dealing with data. It consists of tools and methods

What Are Data?

• Data can be numbers, record names, or other labels recorded for the observational unit.

• Not all data represented by numbers are numerical data (e.g., 1=male, 2=female where 1 and 2 are the indicators of gender).

Page 8: Chapter 1 Sampling and Data. What Is (Are?) Statistics? Statistics (a discipline) is a science of dealing with data. It consists of tools and methods

Data Tables

• The following data table clearly shows the context of the data presented:

• Notice that this data table tells us the variables (column) and observational units (row) for these data.

Page 9: Chapter 1 Sampling and Data. What Is (Are?) Statistics? Statistics (a discipline) is a science of dealing with data. It consists of tools and methods

What is Statistics Really About?

Statistics is about variation. Different observational units may have different data values for a variable. Statistics helps us to deal with variation in order to make sense of data.

Page 10: Chapter 1 Sampling and Data. What Is (Are?) Statistics? Statistics (a discipline) is a science of dealing with data. It consists of tools and methods

Two kinds of Variables

• Qualitative, or Attribute, or Categorical, Variable:

A variable that identifies a categories for each case, for example, gender.Note: Arithmetic operations, such as addition and averaging, are not meaningful for data resulting from a qualitative variable

• Quantitative, or Numerical, Variable: A variable that records measurements or amounts of something and must have measuring units, for example, height measured in inches.Note: Arithmetic operations such as addition and averaging, are meaningful

for data resulting from a quantitative variable

Page 11: Chapter 1 Sampling and Data. What Is (Are?) Statistics? Statistics (a discipline) is a science of dealing with data. It consists of tools and methods

Subdividing Variables Further

• Qualitative and quantitative variables may be further subdivided:

Variable

Qualitative

Quantitative

Nominal

Continuous

Discrete

Ordinal

Page 12: Chapter 1 Sampling and Data. What Is (Are?) Statistics? Statistics (a discipline) is a science of dealing with data. It consists of tools and methods

Key Definitions• Nominal Variable: A qualitative variable that categorizes (or describes, or names) an

element of a population, for example, color of a car purchased.

• Ordinal Variable: A qualitative variable that incorporates an ordered position, or ranking, for instance, The variable Age is recorded as young, middle, and old three possible categories of values.

• Discrete Variable: A quantitative variable that can assume a countable number of values. That is, the values are the counts, for example, number of cars owned. So, a discrete variable can assume values corresponding to integer values along a number line.

• Continuous Variable: A quantitative variable that are measurements such as height,

weight etc. The precision of the values recorded for the variable depends on the measuring scales used. Therefore, a weight of 120 lbs recorded may actually be 120.1 lbs or 120.14 lb or 120.143 lb etc. if a more accurate scale is used for measuring. Therefore, a continuous variable can assume any interval value along a number line, including every possible value between any two values.

Page 13: Chapter 1 Sampling and Data. What Is (Are?) Statistics? Statistics (a discipline) is a science of dealing with data. It consists of tools and methods

Important Reminders!

In many cases, a discrete and continuous variable may be distinguished by determining whether the variables are related to a count or a measurement.

Discrete variables are usually associated with counting.

Continuous variables are usually associated with measurements.

Page 14: Chapter 1 Sampling and Data. What Is (Are?) Statistics? Statistics (a discipline) is a science of dealing with data. It consists of tools and methods

Example

• Example: In a student evaluation of instruction at a large university, one question asks students to evaluate the statement “The instructor was generally interested in teaching” on the following scale:

1 = Disagree Strongly;

2 = Disagree;

3 = Neutral;

4 = Agree;

5 = Agree Strongly.

• Question: Is interest in teaching categorical or quantitative?

Page 15: Chapter 1 Sampling and Data. What Is (Are?) Statistics? Statistics (a discipline) is a science of dealing with data. It consists of tools and methods

Example (cont.)

• Question: Is interest in teaching categorical or quantitative?

• Since there is an order to these ratings, but there are no meaning by adding or subtracting two ratings.

• We conclude that variables like interest in teaching are categorical and are ordinal variables.

Just because your variable’s values are numbers, don’t assume that it’s quantitative.

Page 16: Chapter 1 Sampling and Data. What Is (Are?) Statistics? Statistics (a discipline) is a science of dealing with data. It consists of tools and methods

Data Collection

• First problem a statistician faces: how to obtain the data.

• Usually the data are sample data collected from a portion of the population. It is important to obtain good or representative sample data.

• Statistical Inferences to the population are made based on statistics obtained from the sample data collected.

Page 17: Chapter 1 Sampling and Data. What Is (Are?) Statistics? Statistics (a discipline) is a science of dealing with data. It consists of tools and methods

Biased Sampling

An unbiased sampling method is one that is not biased

Biased Sampling Method: A sampling method that produces data which systematically differs from the sampled population

Sampling methods that often result in biased samples:

• Volunteer sample: sample collected from those elements

of the population which chose to contribute the needed

information on their own initiative

• Convenience sample: sample selected from elements of a population that are easily accessible

Page 18: Chapter 1 Sampling and Data. What Is (Are?) Statistics? Statistics (a discipline) is a science of dealing with data. It consists of tools and methods

Process of Data Collection

1. Define the objectives of the survey or experiment

– Example: Estimate the average length of time for anesthesia to wear off

2. Define the variable and population of interest

– Example: Length of time for anesthesia to wear off after surgery

3. Defining the data-collection and data-measuring schemes. This includes sampling procedures, sample size, and the data-measuring device (questionnaire, scale, ruler, etc.)

4. Determine the appropriate descriptive or inferential data-analysis techniques

Page 19: Chapter 1 Sampling and Data. What Is (Are?) Statistics? Statistics (a discipline) is a science of dealing with data. It consists of tools and methods

Methods Used to Collect DataData can be collected through performing an Experiment or survey or census:

Experiment: The investigator controls or modifies the environment and observes the effect on the variable under study

Census: A 100% survey. Every element of the population is listed. Seldom used: difficult and time-consuming to compile, and expensive.

Survey: Data are obtained by sampling some of the population of interest. The investigator does not modify the environment.

Page 20: Chapter 1 Sampling and Data. What Is (Are?) Statistics? Statistics (a discipline) is a science of dealing with data. It consists of tools and methods

Sample Design: The process of selecting sample elements from the sampling frame

Note: It is important that the sampling frame be representative of the population

Note: There are many different types of sample designs. Usually they all fit into two categories: judgment samples and probability samples.

Sampling Frame: A list of the elements belonging to the population from which the sample will be drawn

Page 21: Chapter 1 Sampling and Data. What Is (Are?) Statistics? Statistics (a discipline) is a science of dealing with data. It consists of tools and methods

Two types of sample designs

Probability Samples: Samples in which the elements to be selected are drawn on the basis of probability. Each element in a population has a certain probability of being selected as part of the sample.

Judgment Samples: Samples that are selected on the basis of being “typical”

– Items are selected that are representative of the population. The validity of the results from a judgment sample reflects the soundness of the collector’s judgment.

Page 22: Chapter 1 Sampling and Data. What Is (Are?) Statistics? Statistics (a discipline) is a science of dealing with data. It consists of tools and methods

Probability Sampling

Probability sampling includes random sampling, systematic sampling, stratified sampling, proportional sampling, and cluster sampling.

Page 23: Chapter 1 Sampling and Data. What Is (Are?) Statistics? Statistics (a discipline) is a science of dealing with data. It consists of tools and methods

Random Sampling

Random Samples: A sample selected in such a way that every element in the population has a equal probability of being chosen. Equivalently, all samples of size n have an equal chance of being selected. Random samples are obtained either by sampling with replacement from a finite population or by sampling without replacement from an infinite population.

Inherent in the concept of randomness: the next result(or occurrence) is not predictable

Notes:

Proper procedure for selecting a random sample: use a random number generator or a table of random numbers

Page 24: Chapter 1 Sampling and Data. What Is (Are?) Statistics? Statistics (a discipline) is a science of dealing with data. It consists of tools and methods

Example

Example: An employer is interested in the time it takes each employee to commute to work each morning. A random sample of 35 employees will be selected and their commuting time will be recorded.

1. There are 2712 employees

2. Each employee is numbered: 0001, 0002, 0003, etc., up to 2712

3. Using four-digit random numbers, a sample is identified: 1315, 0987, 1125, etc.

Page 25: Chapter 1 Sampling and Data. What Is (Are?) Statistics? Statistics (a discipline) is a science of dealing with data. It consists of tools and methods

Systematic Sampling

Note: The systematic technique is easy to execute. However, it has some inherent dangers when the sampling frame is repetitive or cyclical in nature. In these situations the results may not approximate a simple random sample.

Systematic Sample: A sample in which every kth item of the sampling frame is selected, starting from the first element which is randomly selected from the first k elements

Page 26: Chapter 1 Sampling and Data. What Is (Are?) Statistics? Statistics (a discipline) is a science of dealing with data. It consists of tools and methods

Example

Suppose you want to obtain a systematic sample

of 8 houses from a street of 120 houses., so • First, since 120/8=15, choose a random starting

point between 1 and 15.   Let’s say, 11.• Then, choose every 15th house after the 11th

house.

The list of houses selected are

11, 26, 41, 56, 71, 86, 101, and 116.

Page 27: Chapter 1 Sampling and Data. What Is (Are?) Statistics? Statistics (a discipline) is a science of dealing with data. It consists of tools and methods

Strartified Sampling

Stratified Random Sample: A sample obtained by

stratifying or grouping the sampling frame and then

selecting a fixed number of items from each of the

strata/groups by means of a simple random sampling

technique.

Page 28: Chapter 1 Sampling and Data. What Is (Are?) Statistics? Statistics (a discipline) is a science of dealing with data. It consists of tools and methods

Proportional Sampling

Proportional Sample (or Quota Sample): A sample obtained by stratifying the sampling frame and then selecting a number of items in proportion to the size of the strata (or by quota) from each strata by means of a simple random sampling technique

Page 29: Chapter 1 Sampling and Data. What Is (Are?) Statistics? Statistics (a discipline) is a science of dealing with data. It consists of tools and methods

Example

Suppose that in a company there are 180 staff include:

we are asked to take a proportional sample of 40 staff, stratified according to theabove categories.• The first step is to calculate the percentage of staff in each group:

% male, full time = (90/180) x 100 = 0.5 x 100 = 50% male, part time = (18/180) x100 = 0.1 x 100 = 10% female, full time = (9/180) x 100 = 0.05 x 100 = 5% female, part time = (63/180) x100 = 0.35 x 100 = 35

• This tells us that of our sample of 40, 50% should be male, full time. 10% should be male, part time. 5% should be female, full time. 35% should be female, part time. Therefore,

50% of 40 is 20. 10% of 40 is 4. 5% of 40 is 2.

35% of 40 is 14. We need to select 20 full time males, 4 part time males, 5 full time females,

and 35 part time females.

Male, full time 90

Male, part time 18

Female, full time 9

Female, part time 63

Page 30: Chapter 1 Sampling and Data. What Is (Are?) Statistics? Statistics (a discipline) is a science of dealing with data. It consists of tools and methods

Cluster Sampling

Cluster Sample: A sample obtained by stratifying the

sampling frame into clusters first and then randomly

selecting some clusters. Finally, the sample will

include either all elements or a simple random sample of

some of the elements in each of the clusters selected.

Note: The difference between strata and cluster samplings:

All strata are represented in the sample; but only a subset of clusters

are in the sample.

Page 31: Chapter 1 Sampling and Data. What Is (Are?) Statistics? Statistics (a discipline) is a science of dealing with data. It consists of tools and methods

Guideline for Planning a Statistical Study

1. Determine the variables and methods of measuring.2. Decide to collect Identify the individuals or objects

involved. 3. data from an entire population or a sample. If using a

sample, decide on a sampling method.4. Address issues of ethics, privacy, and confidentiality in

planning for data collection.5. Collect data.6. Apply descriptive statistics (Chapters 1, 2, 3) methods

and make conclusion using appropriate inferential statistics methods (Chapters 9, 10, 11) from the data collected.

7. Discussions and recommendations for future studies.

Page 32: Chapter 1 Sampling and Data. What Is (Are?) Statistics? Statistics (a discipline) is a science of dealing with data. It consists of tools and methods

Probability & Statistics

• Probability is the science of making statement about what will occur when samples are drawn from a known population.

• Statistics is the science of organizing a sample data and making inferences about the unknown population from which the sample is drawn.

Probability (Chapters 4, 5, 6, 7, 8) is an vehicle of statistics so that the accuracy of statistical inferences from a sample data to a population can be justified with its chance of occurring. That is, we want to know the chance a similar result will occur, if the study is repeated many more times.

Page 33: Chapter 1 Sampling and Data. What Is (Are?) Statistics? Statistics (a discipline) is a science of dealing with data. It consists of tools and methods

Comparison of Probability & Statistics

Probability: Properties of the population are assumed known. Answer questions about the sample based on these properties.

Statistics: Use information in the sample to draw a conclusion about the population

Page 34: Chapter 1 Sampling and Data. What Is (Are?) Statistics? Statistics (a discipline) is a science of dealing with data. It consists of tools and methods

Example

Example: A jar of M&M’s contains 100 candy pieces, 15 are red. A handful of 10 is selected.

Example: A handful of 10 M&M’s is selected from a jar containing 1000 candy pieces. Three M&M’s

in the handful are red.

Probability question:

What is the probability that 3 of the 10 selected are red?

Statistics question:

What is the proportion of red M&M’s in the entire jar?