handout 19 - sampling methods v20140307-1.0.0

Version 20140307-1.0.0

Handout 19 (CCIP)/Handout 25 (UEI) - Sampling Methods

There are four main reasons for selecting a sample as an inspector:

- Sampling allows an inspector to allocate resources to command-emphasis areas - Saves time compared to trying to inspect the entire population - Saves money compared to trying to inspect the entire population - Analysis of a sample is less cumbersome and more practical than an analysis of the

entire population

The sampling process begins by defining the frame. The frame is the listing of items that make up the population. Examples are wing members, squadron training records, number of primary aircraft authorized, etc.

Figure 1 below shows the two broad categories of sampling and some commonly used sampling methods under each one. The difference between them is that in probability sampling, every item has a 'chance' of being selected, and that chance can be quantified. This is not true for non-probability sampling; every item in a population does not have an equal chance of being selected.

Figure 1 – Sampling categories and methods

Version 20140307-1.0.0

The advantages and disadvantages of non-probability and probability sampling are summarized below.

BIAS

In the above chart on non-probability verses probability, you will notice one of the disadvantages of non-probability sampling is selection bias. So what is bias and why do you care about it as an inspector?

Bias is a general statistical term meaning a systematic (not random) deviation from the true value. A bias of a measurement or a sampling procedure may pose a more serious problem for the inspector than random errors because it cannot be reduced by mere increase in sample size and averaging the outcomes (Berenson, Levine, Krehbiel, 2009).

For example, suppose your wing commander wants to know the wing members’ overall opinion of medical services provided by the medical group. A random sample of 200 individuals has been drawn from within the medical group. If medical group personal opinions about the quality of medical services vary from the rest of the wing, then such a poll is biased. Even if we increase the sample size of medical group personnel to 500, the systematic error is still the same. The sampling strategy introduced bias as it is quite possible that unit as well as professional pride would create a deviation between medical group opinion as compared to the total base population. The deviation is equal to the difference between the population of the medical group and the whole wing population.

This is important to you as an inspector because sampling bias leads to a systematic distortion of the estimate of the sampled probability distribution. In laymen’s terms, the answer you got from the sample does not reflect the real answer for the population. This means you could provide misleading information (good or bad) to your wing commander. This could result in you wing commander addressing a problem that doesn’t really exist or its severity in minimal. Or, your wing commander may not take any corrective action when in reality some action is required. Also, referring back to Handout 2 – Quality Standards for Inspection and Evaluation, there are numerous statements saying that inspector and inspection findings should be free from bias or be unbiased.

Sometimes due to resource constraints, you will be forced to use non-probability sampling techniques. In those cases, you must remember your sample will contain some sort of selection bias, the resulting answers cannot be used for statistical inference, and you must clearly communicate the limitations of the information to your wing commander.

Version 20140307-1.0.0

Probability Sampling

Probability sampling involves the selection of a sample from a population, based on the principle of randomization or chance. Probability sampling is more complex, more time-consuming and usually more costly than non-probability sampling. However, because items from the population are randomly selected and each item's probability of inclusion can be calculated, reliable estimates can be produced along with estimates of the sampling error, and inferences can be made about the population.

The following are some common probability sampling methods:

•simple random sampling

•systematic sampling

•stratified sampling

•cluster sampling

Simple random sampling

In simple random sampling, each member of a population has an equal chance of being included in the sample. Also, each combination of members of the population has an equal chance of composing the sample. Those two properties are what defines simple random sampling. To select a simple random sample, you need to list all of the items in the survey population.

Example 1: To draw a simple random sample of personnel from a wing, each member would need to be numbered sequentially. If there were 5,000 members in the wing and if the sample size were 360, then 360 numbers between 1 and 5,000 would need to be randomly generated by a computer. Each number will have the same chance of being generated by the computer (in order to fill the simple random sampling requirement of an equal chance for every item). The 360 wing members corresponding to the 360 computer-generated random numbers would make up the sample.

Simple random sampling is the easiest method of sampling and it is the most commonly used. Advantages of this technique are that it does not require any additional information on the frame (such as geographic areas) other than the complete list of members of the survey population along with information for contact. Also, since simple random sampling is a simple method and the theory behind it is well established, standard formulas exist to determine the sample size, the estimates and so on, and these formulas are easy to use.

On the other hand, this technique makes no use of auxiliary information present on the frame (i.e., number of employees in each business) that could make the design of the sample more efficient. And although it is easy to apply simple random sampling to small populations, it

Version 20140307-1.0.0

can be expensive and unfeasible for large populations because all elements must be identified and labeled prior to sampling.

Systematic sampling

Sometimes called interval sampling, systematic sampling means that there is a gap, or interval, between each selected item in the sample. In order to select a systematic sample, you need to follow these steps:

1. Number the items on your frame from 1 to N (where N is the total population size).

2. Determine the sampling interval (K) by dividing the number of items in the population by the desired sample size. For example, to select a sample of 100 from a population of 400, you would need a sampling interval of 400 ÷ 100 = 4. Therefore, K = 4. You will need to select one item out of every four items to end up with a total of 100 items in your sample.

3. Select a number between one and K at random. This number is called the random start and would be the first number included in your sample. Using the sample above, you would select a number between 1 and 4 from a table of random numbers or a random number generator. If you choose 3, the third item on your frame would be the first item included in your sample; if you choose 2, your sample would start with the second item on your frame.

4. Select every Kth (in this case, every fourth) item after that first number. For example, the sample might consist of the following items to make up a sample of 100: 3 (the random start), 7, 11, 15, 19...395, 399 (up to N, which is 400 in this case).

Using the example above, you can see that with a systematic sample approach there are only four possible samples that can be selected, corresponding to the four possible random starts:

1, 5, 9, 13... 393, 397

2, 6, 10, 14... 394, 398

3, 7, 11, 15... 395, 399

4, 8, 12, 16... 396, 400

Each member of the population belongs to only one of the four samples and each sample has the same chance of being selected. From that, we can see that each item has a one in four chance of being selected in the sample. This is the same probability as if a simple random sampling of 100 items was selected. The main difference is that with simple random sampling, any combination of 100 items would have a chance of making up the sample, while with systematic sampling, there are only four possible samples. From that, we can see how precise systematic sampling is compared with simple random sampling. The population's order on the frame will determine the possible samples for systematic sampling. If the population is randomly

Version 20140307-1.0.0

distributed on the frame, then systematic sampling should yield results that are similar to simple random sampling.

This method is often used in industry, where an item is selected for testing from a production line to ensure that machines and equipment are of a standard quality. For example, a tester in a manufacturing plant might perform a quality check on every 20th product in an assembly line. The tester might choose a random start between the numbers 1 and 20. This will determine the first product to be tested; every 20th product will be tested thereafter.

Example 2: Imagine you have to conduct a survey on base housing for your wing. Your wing has an adult base housing population of 1,000 and you want to take a systematic sample of 200 adult base residents. In order to do this, you must first determine what your sampling interval (K) would be:

Total population (N) ÷ sample size (n) = sampling interval (K)

N ÷ n = K

1,000 ÷ 200 = K

5 = K

To begin this systematic sample, all adult base residents would have to be assigned sequential numbers. The starting point would be chosen by selecting a random number between 1 and 5. If this number were 3, then the 3rd resident on the list would be selected along with every 5th student thereafter. The sample of residents would be those corresponding to student numbers 3, 8, 13, 18, 23....

The advantages of systematic sampling are that the sample selection cannot be easier (you only get one random number—the random start—and the rest of the sample automatically follows) and that the sample is distributed evenly over the listed population. The biggest drawback of the systematic sampling method is that if there is some cycle in the way the population is arranged on a list and if that cycle coincides in some way with the sampling interval, the possible samples may not be representative of the population.

Stratified sampling

Using stratified sampling, the population is divided into homogeneous, mutually exclusive groups called strata, and then independent samples are selected from each stratum. Any of the sampling methods mentioned in this lesson can be used to sample within each stratum. The sampling method can vary from one stratum to another. When simple random sampling is used to select the sample within each stratum, the sample design is called stratified simple random sampling. A population can be stratified by any variable that is available for all

Version 20140307-1.0.0

items on the sampling frame prior to sampling (e.g., rank, age, sex, on base/off base, Active/Guard/Reserve, income, etc.).

Figure 2 – Example of Strata used in Stratified Sampling

Why do we need to create strata? There are many reasons, the main one being that it can make the sampling strategy more efficient. It was mentioned earlier that you need a larger sample to get a more accurate estimation of a characteristic that varies greatly from one item to the other than for a characteristic that does not. For example, if every person in a population had the same salary, then a sample of one individual would be enough to get a precise estimate of the average salary.

This is the idea behind the efficiency gain obtained with stratification. If you create strata within which items share similar characteristics (e.g., income) and are considerably different from items in other strata (e.g., occupation, type of dwelling) then you would only need a small sample from each stratum to get a precise estimate of total income for that stratum. Then you could combine these estimates to get a precise estimate of total income for the whole population. If you were to use a simple random sampling approach in the whole population without stratification, the sample would need to be larger than the total of all stratum samples to get an estimate of total income with the same level of precision.

Stratified sampling ensures an adequate sample size for sub-groups in the population of interest. When a population is stratified, each stratum becomes an independent population and you will need to decide the sample size for each stratum.

Example 3: Suppose you need to determine how many focus groups by rank strata you need to get an understanding of the perceptions, attitudes and beliefs in your wing. In order to select a stratified simple random sample, you need to follow these steps:

Version 20140307-1.0.0

1. Determine the total wing population. 5,000 members 2. Determine the subpopulations based on your rank strata.

a. Field grade officers – 400 b. Company grade officers – 600 c. Senior NCOs – 500 d. NCOs - 1700 e. Airmen – 1800

3. Determine the proportional percentage of each subpopulation a. Field grade officers – 400/5000 = 8% b. Company grade officers – 600/5000 = 12% c. Senior NCOs – 500/5000 = 10% d. NCOs – 1700/5000 = 34% e. Airmen – 1800/5000 = 36%

4. Assuming your total sample size needed was 360 members, multiply the proportional percentage for each rank strata against the total sample size to determine the subpopulation sample size.

a. Field grade officers – 360 * 8% = 29 b. Company grade officers – 360 * 12% = 43 c. Senior NCOs – 360 * 10% = 36 d. NCOs – 360 * 34% = 122 e. Airmen – 360 * 36% = 130

5. Select a simple random sample from each subpopulation based on the numbers in step 4. This will give you a stratified simple random sample of 360 members.

6. Assuming you want to keep your focus groups to a maximum of 10 participants/group, you would use the numbers in step 4 to determine how many focus groups you need for each rank strata.

a. Field grade officers – 29/10 = 3 groups b. Company grade officers – 43/10 = 5 groups c. Senior NCOs – 36/10 = 4 groups d. NCOs – 122/10 = 13 groups e. Airmen – 130/10 = 13 groups

Stratification is most useful when the stratifying variables are

•simple to work with,

•easy to observe, and

•closely related to the topic of interest

Version 20140307-1.0.0

A words of caution on the next technique you are about to read. This technique is only estimating the probability of missing a potentially important perception, belief or attitude. It is not estimating the percent of a target population who hold a particular perception, belief or attitude. It will tell you that you have issues, but it will not tell you how widespread the issues are. It will discover issues, but not measure the issues. Once you uncovered the set of perceptions, beliefs or attitudes within an organization using this technique, you would then have to perform additional data collection and analysis to determine how widespread or important the individual issues are with the organization.

This alternative method to determine to the number of focus groups is to randomly select 30 members from each rank strata and then divide each rank strata into 3 focus groups of 10. Based on the rank strata groups in example 3, you would create 15 focus groups of 10 people (150 personnel total). Your focus groups would look like this:

a. Field grade officers – 3 x 10-person focus groups b. Company grade officers – 3 x 10-person focus groups c. Senior NCOs – 3 x 10-person focus groups d. NCOs - 3 x 10-person focus groups e. Airmen – 3 x 10-person focus groups

Choosing 30 from each group means there is less than a 5% chance that you have missed an attitude, perception, or belief with an incidence rate of 10% within the population. The logic behind this approach and its applications are contained in the following article “Sample Size for Qualitative Research” by Peter DePaulo in Attachment 1. Also, if you felt gender might play a role in the issues identified, you would also need to create six additional focus groups (3 groups of 10 men, and 3 groups of 10 women). The complete logic behind this approach and its proper applications are contained in the following article “Sample Size for Qualitative Research” by Peter DePaulo in Attachment 1. A sample size calculator based on this approach is in Attachment 2. ******************************************************************************

Cluster sampling

Sometimes it is too expensive to spread a sample across the population as a whole. Travel costs can become expensive if interviewers have to survey people from one end of the country to the other. To reduce costs, statisticians may choose a cluster sampling technique.

Cluster sampling divides the population into groups or clusters. A number of clusters are selected randomly to represent the total population, and then all items within selected clusters are included in the sample. If clusters are large, a probability-based sample taken from a single

Version 20140307-1.0.0

cluster is all that is needed. No items from non-selected clusters are included in the sample—they are represented by those from selected clusters. This differs from stratified sampling, where some items are selected from each group.

Examples of clusters are squadrons (fighter, communication, comptroller, etc), groups (operations, maintenance, medical, logistics) and geographic areas such as housing, flight line, north base, south base etc. The selected clusters are used to represent the population.

Example 4: Suppose your wing commander wants to find out the general readiness of personal mobility bags across the wing. It would be too costly and lengthy to inspect every personal mobility bag in the wing. Instead, 10 squadrons are randomly selected from all over the wing. These squadrons provide clusters of samples. Then every personal mobility bag in all 10 clusters is inspected. In effect, the bags in these clusters represent all bags in the wing.

As mentioned, cost reduction is a reason for using cluster sampling. It creates 'pockets' of sampled items instead of spreading the sample over the whole population. Another reason is that sometimes a list of all items in the population (a requirement when conducting simple random sample, systematic sample or sampling with probability proportional to size) is not available, while a list of all clusters is either available or easy to create.

In most cases, the main drawback is a loss of efficiency when compared with simple random sampling. It is usually better to survey a large number of small clusters instead of a small number of large clusters. This is because neighboring items tend to be more alike, resulting in a sample that does not represent the whole spectrum of opinions or situations present in the overall population. In the previous examples, the readiness of personal mobility bags in the same squadron or group may be similar due to leadership emphasis within that cluster, deployment taskings, etc.

Another drawback to cluster sampling is that you do not have total control over the final sample size. Since not all squadrons have the same number of people and you must inspect every bag in your sample, the final sample size may be larger or smaller than you expected or needed.

Non-probability Sampling

The difference between probability and non-probability sampling has to do with a basic assumption about the nature of the population under study. In probability sampling, every item has a chance of being selected. In non-probability sampling, there is an assumption that there is an even distribution of characteristics within the population. This is what makes the INSPECTOR believe that any sample would be representative and because of that, results will be

Version 20140307-1.0.0

accurate. For probability sampling, randomization is a feature of the selection process, rather than an assumption about the structure of the population.

In non-probability sampling, since elements are chosen arbitrarily, there is no way to estimate the probability of any one element being included in the sample. Also, no assurance is given that each item has a chance of being included, making it impossible either to estimate sampling variability or to identify possible bias.

Reliability cannot be measured in non-probability sampling; the only way to address data quality is to compare some of the survey results with available information about the population. Still, there is no assurance that the estimates will meet an acceptable level of error. Statisticians are reluctant to use these methods because there is no way to measure the precision of the resulting sample.

Despite these drawbacks, non-probability sampling methods can be useful when descriptive comments about the sample itself are desired. Secondly, they are quick, inexpensive and convenient. There are also other circumstances, such as in applied social research, when it is unfeasible or impractical to conduct probability sampling.

Most non-sampling methods require some effort and organization to complete, but others, like convenience sampling, are done casually and do not need a formal plan of action. The most common types are listed below:

•purposive sampling

•convenience or haphazard sampling

•volunteer sampling

•judgment sampling

•quota sampling

Purposive sampling

Purposive sampling involves taking a sample with a specific purpose or objective in mind. For example, under current fiscal constraints, IG organizations do not have the manpower and money to inspect all programs and associated items. This requires them to select a sample of programs for inspection that provides the greatest return on inspection investment dollar. Some factors that will drive this purposive sampling approach will be:

- Special Interest Items (SIIs)

Version 20140307-1.0.0

- Command-interest areas or Command-emphasis areas - Likelihood of program failure - Impact on the mission or people if a program fails - Negative indicators that there may be issues with a program

Based on these factors, the IG organization should strive to select a purposive sample with the objective being to select a sample that contains only high risk programs where there are negative indicators so that there is a maximum return on inspection dollars.

Convenience or haphazard sampling

Convenience sampling is sometimes referred to as haphazard or accidental sampling. It is not normally representative of the target population because sample items are only selected if they can be accessed easily and conveniently.

There are times when the average person uses convenience sampling. A food critic, for example, may try several appetizers or entrees to judge the quality and variety of a menu. And television reporters often seek so-called ‘people-on-the-street interviews' to find out how people view an issue. In both these examples, the sample is chosen randomly, without use of a specific sampling method.

The obvious advantage is that the method is easy to use, but that advantage is greatly offset by the presence of bias. Although useful applications of the technique are limited, it can deliver accurate results when the population is homogeneous.

For example, a scientist could use this method to determine whether a lake is polluted. Assuming that the lake water is well-mixed, any sample would yield similar information. A scientist could safely draw water anywhere on the lake without fretting about whether or not the sample is representative.

Examples of convenience sampling include:

•the first row of mobility bags sitting in the first row of a mobility warehouse

•the first 100 military members to enter the mobility processing line

•the first 50 customers to through the chow hall

•the first 10 travel vouchers processed that day.

Volunteer sampling

As the term implies, this type of sampling occurs when people volunteer their services for the study. In psychological experiments or pharmaceutical trials (drug testing), for example, it would be difficult and unethical to enlist random participants from the general public. In these

Version 20140307-1.0.0

instances, the sample is taken from a group of volunteers. Sometimes, the INSPECTOR offers payment to entice respondents. In exchange, the volunteers accept the possibility of a lengthy, demanding or sometimes unpleasant process.

Sampling voluntary participants as opposed to the general population may introduce strong biases. Often in opinion polling, only the people who care strongly enough about the subject one way or another tend to respond. The silent majority does not typically respond, resulting in large selection bias. Television and radio media often use call-in polls to informally query an audience on their views. Oftentimes, there is no limit imposed on the frequency or number of calls one respondent can make. So, unfortunately, a person might be able to vote repeatedly. It should also be noted that the people who contribute to these surveys might have different views than those who do not.

Judgment Sampling

This approach is used when you want a quick sample and you believe you are able to select a sufficiently representative sample for your purposes. You will use your own judgment to select what seems like an appropriate sample.

This method is highly liable to bias and error as the INSPECTOR makes inexpert judgment and selection. You probably have be experienced in research methods before you can make a fair judgment about the right sample. Judgment sampling is often a last-resort method that may be used when there is no time to do a proper study. In qualitative research, it is common and can be appropriate as the INSPECTOR explores anthropological situations where the discovery of meaning can benefit from an intuitive approach.

Quota sampling

This is one of the most common forms of non-probability sampling. Sampling is done until a specific number of items (quotas) for various sub-populations have been selected. Since there are no rules as to how these quotas are to be filled, quota sampling is really a means for satisfying sample size objectives for certain sub-populations.

The quotas may be based on population proportions. For example, if there are 100 men and 100 women in a population and a sample of 20 are to be drawn to participate in a cola taste challenge, you may want to divide the sample evenly between the sexes—10 men and 10 women. Quota sampling can be considered preferable to other forms of non-probability sampling (e.g., judgment sampling) because it forces the inclusion of members of different sub-populations.

Quota sampling is somewhat similar to stratified sampling in that similar items are grouped together. However, it differs in how the items are selected. In probability sampling, the items are selected randomly while in quota sampling it is usually left up to the interviewer to

Version 20140307-1.0.0

decide who is sampled. This results in selection bias. Thus, quota sampling is often used by market INSPECTORs (particularly for telephone surveys) instead of stratified sampling, because it is relatively inexpensive and easy to administer and has the desirable property of satisfying population proportions. However, it disguises potentially significant bias.

As with all other non-probability sampling methods, in order to make inferences about the population, it is necessary to assume that persons and/or things selected are similar to those not selected. Such strong assumptions are rarely valid.

The main difference between stratified sampling and quota sampling is that stratified sampling would select the students using a probability sampling method such as simple random sampling or systematic sampling. In quota sampling, no such technique is used. The 15 students might be selected by choosing the first 15 Grade 10 students to enter school on a certain day, or by choosing 15 students from the first two rows of a particular classroom. Keep in mind that those students who arrive late or sit at the back of the class may hold different opinions from those who arrived earlier or sat in the front.

The main argument against quota sampling is that it does not meet the basic requirement of randomness. Some items may have no chance of selection or the chance of selection may be unknown. Therefore, the sample may be biased. Quota sampling is generally less expensive than random sampling. It is also easy to administer, especially considering the tasks of listing the whole population, randomly selecting the sample and following-up on non-respondents can be omitted from the procedure. Quota sampling is an effective sampling method when information is urgently required and can be carried out independent of existing sampling frames. In many cases where the population has no suitable frame, quota sampling may be the only appropriate sampling method.

Version 20140307-1.0.0

Attachment 1

Sample size for qualitative research The risk of missing something important

Editor’s note: Peter DePaulo is an independent marketing research consultant and focus group moderator doing business as DePaulo Research Consulting, Montgomeryville, Pa.

In a qualitative research project, how large should the sample be? How many focus group respondents, individual depth interviews (IDIs), or ethnographic observations are needed?

We do have some informal rules of thumb. For example, Maria Krieger (in her white paper, “The Single Group Caveat,” Brain Tree Research & Consulting, 1991) advises that separate focus groups are needed for major segments such as men, women, and age groups, and that two or more groups are needed per segment because any one group may be idiosyncratic. Another guideline is to continue doing groups or IDIs until we seem to have reached a saturation point and are no longer hearing anything new.

Such rules are intuitive and reasonable, but they are not solidly grounded and do not really tell us what an optimal qualitative sample size may be. The approach proposed here gives specific answers based on a firm foundation.

First, the importance of sample size in qualitative research must be understood.

Size does matter, even for a qualitative sample

One might suppose that “N” (the number in the sample) simply is not very important in a qualitative project. After all, the effect of increasing N, as we learned in statistics class, is to reduce the sampling error (e.g., the +/- 3 percent variation in opinion polls with N = 1,000) in a quantitative estimate. Qualitative research normally is inappropriate for estimating quantities. So, we lack the old familiar reason for increasing sample size.

Nevertheless, in qualitative work, we do try to discover something. We may be seeking to uncover: the reasons why consumers may or may not be satisfied with a product; the product attributes that may be important to users; possible consumer perceptions of celebrity spokespersons; the various problems that consumers may experience with our brand; or other kinds of insights. (For lack of a better term, I will use the word “perception” to refer to a reason, need, attribute, problem, or whatever the qualitative project is intended to uncover.) It would be up to a subsequent quantitative study to estimate, with statistical precision, how important or prevalent each perception actually is.

The key point is this: Our qualitative sample must be big enough to assure that we are likely to hear most or all of the perceptions that might be important. Within a target market, different customers may have diverse perceptions. Therefore, the smaller the sample size, the narrower the range of perceptions we may hear. On the positive side, the larger the sample size, the less likely it is that we would fail to discover a perception that we would have wanted to know. In other

Version 20140307-1.0.0

words, our objective in designing qualitative research is to reduce the chances of discovery failure, as opposed to reducing (quantitative) estimation error.

Discovery failure can be serious

What might go wrong if a qualitative project fails to uncover an actionable perception (or attribute, opinion, need, experience, etc.)? Here are some possibilities:

• A source of dissatisfaction is not discovered - and not corrected. In highly competitive industries, even a small incidence of dissatisfaction could dent the bottom line.

• In the qualitative testing of an advertisement, a copy point that offends a small but vocal subgroup of the market is not discovered until a public-relations fiasco erupts.

• When qualitative procedures are used to pre-test a quantitative questionnaire, an undiscovered ambiguity in the wording of a question may mean that some of the subsequent quantitative respondents give invalid responses. Thus, qualitative discovery failure eventually can result in quantitative estimation error due to respondent miscomprehension.

Therefore, size does matter in a qualitative sample, though for a different reason that in a quant sample. The following example shows how the risk of discover failure may be easy to overlook even when it is formidable.

Example of the risk being higher than expected

The managers of a medical clinic (name withheld) had heard favorable anecdotal feedback about the clinic’s quality, but wanted an independent evaluation through research. The budget permitted only one focus group with 10 clinic patients. All 10 respondents clearly were satisfied with the clinic, and group discussion did not reverse these views.

Did we miss anything as a result of interviewing only 10? Suppose, for example that the clinic had a moody staff member who, unbeknownst to management, was aggravating one in 10 clinic patients. Also, suppose that management would have wanted to discover anything that affects the satisfaction at least 10 percent of customers. If there really was an unknown satisfaction problem with a 10 percent incidence, then what was the chance that our sample of 10 happened to miss it? That is, what is the probability that no member of the subgroup defined as those who experienced the staffer in a bad mood happened to get into the sample?

At first thought, the answer might seem to be “not much” chance of missing the problem. The hypothetical incidence is “one in 10,” and we did indeed interview 10 patients. Actually, the probability that our sample failed to include a patient aggravated by the moody staffer turns out to be just over one in three (0.349 to be exact). This probability is simple to calculate: Consider that the chance of any one customer selected at random not being a member of the 10 percent (aggravated) subgroup is 0.9 (i.e., a nine in 10 chance). Next, consider that the chance of failing to reach anyone from the 10 percent subgroup twice in a row (by selecting two customers at random) is 0.9 X 0.9, or 0.9 to the second power, which equals 0.81. Now, it should be clear that

Version 20140307-1.0.0

the chance of missing the subgroup 10 times in a row (i.e., when drawing a sample of 10) is 0.9 to the tenth power, which is 0.35. Thus, there is a 35 percent chance that our sample of 10 would have “missed” patients who experienced the staffer in a bad mood. Put another way, just over one in three random samples of 10 will miss an experience or characteristic with an incidence of 10 percent.

This seems counter-intuitively high, even to quant researchers to whom I have shown this analysis. Perhaps people implicitly assume the fallacy that if something has an overall frequency of one in N, then it is almost sure to appear in N chances.

Basing the decision on calculated probabilities

So, how can we figure the sample size needed to reduce the risk as much as we want? I am proposing two ways. One would be based on calculated probabilities like those in the table above, which was created by repeating the power calculations described above for various incidences and sample sizes. The client and researcher would peruse the table and select a sample size that is affordable yet reduces the risk of discover failure to a tolerable level.

For example, if the research team would want to discover a perception with an incidence as low as 10 percent of the population, and if the team wanted to reduce the risk of missing that subgroup to less than 5 percent, then a sample of N=30 would suffice, assuming random selection. (To be exact, the risk shown in the table is .042, or 4.2 percent.) This is analogous to having 95 percent confidence in being able to discover a perception with a 10 percent incidence. Remember, however, that we are expressing the confidence in uncovering a qualitative insight - as opposed to the usual quantitative notion of “confidence” in estimating a proportion or mean plus or minus the measurement error.

If the team wants to be more conservative and reduce the risk of missing the one-in-10 subgroup to less than 1 percent (i.e., 99 percent confidence), then a sample of nearly 50 would be needed. This would reduce the risk to nearly 0.005 (see table).

Version 20140307-1.0.0

What about non-randomness?

Of course, the table assumes random sampling, and qualitative samples often are not randomly drawn. Typically, focus groups are recruited from facility databases, which are not guaranteed to be strictly representative of the local adult population, and factors such as refusals (also a problem in quantitative surveys, by the way) further compromise the randomness of the sample.

Unfortunately, nothing can be done about subgroups that are impossible to reach, such as people who, for whatever reason, never cooperate when recruiters call. Nevertheless, we can still sample those subgroups who are less likely to be reached as long as the recruiter’s call has some chance of being received favorably, for example, people who are home only half as often as the average target customer but will still answer the call and accept our invitation to participate. We can compensate for their reduced likelihood of being contacted by thinking of their reachable incidence as half of their actual incidence. Specifically, if we wanted to allocate enough budget to reach a 10 percent subgroup even if it is twice as hard to reach, then we would suppose that their reachable incidence is as low as 5 percent, and look at the 5 percent row in the table. If, for instance, we wanted to be very conservative, we would recruit 100 respondents, resulting in less than a 1 percent chance - .006, to be exact - of missing a 5 percent subgroup (or a 10 percent subgroup that behaves like a 5 percent subgroup in likelihood of being reached).

An approach based on actual qualitative findings

The other way of figuring an appropriate sample size would be to consider the findings of a pair of actual qualitative studies reported by Abbie Griffin and John Hauser in an article, “The Voice of the Customer” (Marketing Science, Winter 1993). These researchers looked at the number of customer needs uncovered by various numbers of focus groups and in-depth interviews.

In one of the two studies, two-hour focus groups and one-hour in-depth interviews (IDIs) were conducted with users of a complex piece of office equipment. In the other study, IDIs were conducted with consumers of coolers, knapsacks, and other portable means of storing food. Both studies looked at the number of needs (attributes, broadly defined) uncovered for each product category. Using mathematical extrapolations, the authors hypothesized that 20-30 IDIs are needed to uncover 90-95 percent of all customer needs for the product categories studied.

As with typical learning curves, there were diminishing returns in the sense that fewer new (non-duplicate) needs were uncovered with each additional IDI. It seemed that few additional needs would be uncovered after 30 IDIs. This is consistent with the probability table (shown earlier), which shows that perceptions of all but the smallest market segments are likely to be found in samples of 30 or less.

In the office equipment study, one two-hour focus group was no better than two one-hour IDIs, implying that “group synergies [did] not seem to be present” in the focus groups. The study also suggested that multiple analysts are needed to uncover the broadest range of needs.

Version 20140307-1.0.0

These studies were conducted within the context of quality function deployment, where, according to the authors, 200-400 “customer needs” are usually identified. It is not clear how the results might generalize to other qualitative applications.

Nevertheless, if one were to base a sample-size decision on the Griffin and Hauser results, the implication would be to conduct 20-30 IDIs and to arrange for multiple analysts to look for insights in the data. Perhaps backroom observers could, to some extent, serve as additional analysts by taking notes while watching the groups or interviews. The observers’ notes might contain some insights that the moderator overlooks, thus helping to minimize the chances of missing something important.

N=30 as a starting point for planning

Neither the calculation of probabilities in the prior table nor the empirical rationale of Griffin and Hauser is assured of being the last word on qualitative sample size. There might be other ways of figuring the number of IDIs, groups, or ethnographic observations needed to avoid missing something important.

Until the definitive answer is provided, perhaps an N of 30 respondents is a reasonable starting point for deciding the qualitative sample size that can reveal the full range (or nearly the full range) of potentially important customer perceptions. An N of 30 reduces the probability of missing a perception with a 10 percent-incidence to less than 5 percent (assuming random sampling), and it is the upper end of the range found by Griffin and Hauser. If the budget is limited, we might reduce the N below 30, but the client must understand the increased risks of missing perceptions that may be worth knowing. If the stakes and budget are high enough, we might go with a larger sample in order to ensure that smaller (or harder to reach) subgroups are still likely to be represented.

If focus groups are desired, and we want to count each respondent separately toward the N we choose (e.g., getting an N of 30 from three groups with 10 respondents in each), then it is important for every respondent to have sufficient air time on the key issues. Using mini groups instead of traditional-size groups could help achieve this objective. Also, it is critical for the moderator to control dominators and bring out the shy people, lest the distinctive perceptions of less-talkative customers are missed.

Across segments or within each one?

A complication arises when we are separately exploring different customer segments, such as men versus women, different age groups, or consumers in different geographic regions. In the case of gender and a desired N of 30, for example, do we need 30 in total (15 males plus 15 females) or do we really need to interview 60 people (30 males plus 30 females)? This is a judgment call, which would depend on the researchers’ belief in the extent to which customer perceptions may vary from segment to segment. Of course, it may also depend on budget. To play it safe, each segment should have its own N large enough so that appreciable subgroups within the segment are likely to be represented in the sample.

Version 20140307-1.0.0

What if we only want the “typical” or “majority” view?

For some purportedly qualitative studies, the stated or implied purpose may be to get a sense of how customers feel overall about the issue under study. For example, the client may want to know whether customers “generally” respond favorably to a new concept. In that case, it might be argued that we need not be concerned about having a sample large enough to make certain that we discover minority viewpoints, because the client is interested only in how “most” customers react.

The problem with this agenda is that the “qualitative” research would have an implicit quantitative purpose: to reveal the attribute or point of view held by more than 50 percent of the population. If, indeed, we observe what “most” qualitative respondents say or do and then infer that we have found the majority reaction, we are doing more than “discovering” that reaction: We are implicitly estimating its incidence at more than 50 percent.

The approach I propose makes no such inferences. If we find that only one respondent in a sample of 30 holds a particular view, we make no assumption that it represents a 10 percent population incidence, although, as discussed later, it might be that high. The actual population incidence is likely to be closer to 3.3 percent (1/30) than to 10 percent. Moreover, to keep the study qualitative, we should not say that we have estimated the incidence at all. We only want to ensure that if there is an attribute or opinion with an incidence as low as 10 percent, we are likely to have at least one respondent to speak for it - and a sample of 30 will probably do the job.

If we do want to draw quantitative inferences from a qualitative procedure (and, normally, this is ill advised), then this paper does not apply. Instead, the researchers should use the usual calculations for setting a quantitative sample size at which the estimation error resulting from random sampling variations would be acceptably low.

Keeping qualitative pure

Whenever I present this sample-size proposal, someone usually objects that I am somehow “quantifying qualitative.” On the contrary, estimating the chances of missing a potentially important perception is completely different from estimating the percent of a target population who hold a particular perception. To put it another way, calculating the odds of missing a perception with a hypothetical incidence does not quantify the incidences of those perceptions that we actually do uncover.

Therefore, qualitative consultants should not be reluctant to talk about the probability of missing something important. In so doing, they will not lose their identity as qualitative researchers, nor will they need any “high math.” Moreover, by distinguishing between discovery failure and estimation error, researchers can help their clients fully understand the difference between qualitative and quantitative purposes. In short, the approach I propose is intended to ensure that qualitative will accomplish what it does best - to discover (not measure) potentially important insights.

Version 20140307-1.0.0

Attachment 2

Sample Size Calculator

for

Probability of Missing a Subpopulation in a Focus Group

(Double click on the table to activate it)

Incidence10 20 30 40 50 60

5% 0.5987 0.3585 0.2146 0.1285 0.0769 0.046110% 0.3487 0.1216 0.0424 0.0148 0.0052 0.001815% 0.1969 0.0388 0.0076 0.0015 0.0003 0.000120% 0.1074 0.0115 0.0012 0.0001 0.0000 0.0000

Note 1: The numbers in Row 6 can be edited based on group sizesNote 2: The percentages in column B can be edited to fine-tune acceptable

Number of Participants

Probability of Missing a Subpopulation (“Factor”) in a Focus Group with Randomly Sampled Participants

Notes : Probabi l i ties greater than 5% are indicated by gray text (excluded by "rule of thumb" in that miss ing a factor more than five times in 100 i s undes i rable). An optimal outcome is indicated by red text; the probabi l i ty of miss ing a factor with a 10% incidence rate in the population i s .0424 when 30 participants are randomly sampled from the population.

handout 19 - sampling methods v20140307-1.0.0

Documents