discrete and random variables

INTRODUCTION

In statistics, numerical random variables represent counts and measurements. They come in

two different flavors: discrete and continuous, depending on the type of outcomes that are

possible:

Discrete random variables. If the possible outcomes of a random variable can be listed out

using a finite (or countably infinite) set of single numbers (for example, {0, 1, 2 . . . , 10}; or

{-3, -2.75, 0, 1.5}; or {10, 20, 30, 40, 50…} ), then the random variable is discrete.

Continuous random variables. If the possible outcomes of a random variable can only be

described using an interval of real numbers (for example, all real numbers from zero to

ten ), then the random variable iscontinuous.

Discrete random variables typically represent counts — for example, the number of people who

voted yes for a smoking ban out of a random sample of 100 people (possible values are 0, 1, 2, .

. . , 100); or the number of accidents at a certain intersection over one year's time (possible

values are 0, 1, 2, . . .).

Discrete random variables have two classes: finite and countably infinite. A discrete random

variable is finite if its list of possible values has a fixed (finite) number of elements in it (for

example, the number of smoking ban supporters in a random sample of 100 voters has to be

between 0 and 100). One very common finite random variable is obtained from the binomial

distribution.

A discrete random variable is countably infinite if its possible values can be specifically listed out

but they have no specific end. For example, the number of accidents occurring at a certain

intersection over a 10-year period can take on possible values: 0, 1, 2, . . . (in theory, the

number of accidents can take on infinitely many values.).

Continuous random variables typically represent measurements, such as time to complete a

task (for example 1 minute 10 seconds, 1 minute 20 seconds, and so on) or the weight of a

newborn. What separates continuous random variables from discrete ones is that they

are uncountably infinite; they have too many possible values to list out or to count and/or they

can be measured to a high level of precision (such as the level of smog in the air in Los Angeles

on a given day, measured in parts per million).

CONTROL CHARTS FOR ATTRIBUTE VARIABLES

Like the continuous variable control charts, the control chart for an attribute variable

also takes the form of a sideways, two-way comparison of a two-sided hypothesis test. The

difference between the continuous and attribute control charts lie in the underlying

distributions. Most continuous variables are well represented by the Normal Distribution, while

the attribute variables are typically modeled by the Binomial or Poisson Distributions.

Both the Binomial and Poisson are forms of discrete distributions, which means that the

variable takes on non-negative, integer values (such as defect counts). Continuous

distributions, on the other hand, allow the variable to take on non-negative, real values, such as

length measurements. For both types of distributions, the probability of observing a particular

outcome was found using the probability density function (the curve) of the distribution.

With continuous distributions, the probability of observing a range of values was

defined by the area under the curve. This area was computed by integration (or looking up the

value from a table of integral values). For a discrete distribution, the probability of observing a

range of outcomes is found by summing up the probability of observing each outcome in the

range of values.

Example:

Assume that you have two six-sided dice. The possible outcomes for the sum of the two

die are the discrete values from 2 through 12. If you tabulated the different ways (rolls

of each die) in which you could reach each of the totals, you would have a histogram

that describes the PDF for your dice (assuming that they are “fair”).

What is the most frequently occurring sum that you could roll?

What is the probability of obtaining the most likely sum in a single roll of the dice?

What is the probability of obtaining a sum greater than 2 and less than 11?

There are four types of control charts commonly used with attribute data. The decision on

which to use depends on: (a) whether or not a unit is to be classified defective (having one or

more defects), or if the number of defects in a unit (or per unit) is of interest; and (b) if the size

of the rational sampling group is fixed or variable.

A unit can be classified as defective (or non-conforming) if it contains one or more defects. If

the number of defective units in a sample of units is of interest, and if the number of units in

the sample is constant, the np-Chart is used to track the production process. However, if the

number of units in the sample varies, and the interest in the fraction (or percentage) of

defective units in the sample, then the p-Chart is used.

Sometimes just the number of defects per inspection unit is a better measure of performance.

If it is easier to count the number of defects in a fixed-size inspection unit (defects per 100

solder joints), then the c-Chart is used. But if the size of the inspection unit could vary (perhaps

the current inspection unit has 350 solder joints this time), then the u-Chart will let us track the

number of defects on a per-unit basis (where the number of inspection units is 3.5 in this case).

Figure 1 (below) depicts the decision process for choosing the most appropriate control chart.

The following sections describe how the control limits for these control charts are computed,

and how these charts are interpreted.

P-Charts

P-Charts are derived from the Binomial Distribution, and are used to track the proportion (p)

that are defective within a variable sample size. If D is the number of defective units in a

random sample of size n, then our sample proportion defective will be:

p̂=Dn

Since these samples come from a binomial distribution, and assuming that we knew the true

proportion defective in all the product was p, then the probability that the number of

defectives (D) in a sample of size n is exactly x units is given by:

P {D=x}=(nx ) px(1−p )n−xwhere x = 0, 1, 2, …

If we took a large enough number of samples, we would find that the mean proportion

defective in the distribution () would be very close to p, and that the population variance

would be given by:

σ p̂2=p(1−p )n

If we wanted to do a two-sided hypothesis test to see if the proportion defective from one

sample was different from the proportion defective found in another sample, we could use an

approximate normal distribution and the test statistic:

z0=p̂1− p̂2

√ p̂(1− p̂ )( 1n1

− 1n2 ) where

p̂=n1 p̂1+n2 p̂2

n1+n2

This hypothesis test lends itself to the creation of a control chart for the proportion defective if

we are taking random samples from an industrial process. Like we did with the continuous

variable control

Use p-Chart

No, varies

Yes, constant

Use np-Chart

Individual Defects

Poisson DistributionUse c-Chart

Use u-Chart

No, varies

Kind of inspection variable?

Defective Units

(possibly with multiple defects)Binomial Distribution

Discrete

Attribute

What is the inspection basis? Is the size of the inspection unit fixed?

Yes, constant

Is the size of the inspection sample fixed?

Figure1.

Use X-bar and S-Chart

Use X-bar and R-Chart

Which spread method

preferred?

Standard Deviation

RangeContinuous

Variable

charts, we’ll turn the hypothesis test on its’ side, and estimate a centerline and the upper and

lower control limits.

The best guess for the unknown population proportion defective would be to find the mean

proportion defective over a large number of samples, and let this become our centerline:

p=∑i=1

m

pi

m=∑i=1

m

Di

mn where m is the number of samples, each of size n

As before, when we do not yet have an idea of the process’ performance, we would estimate

the control limits from a large number (20-25) of independent and random samples. Using plus

and minus three standard deviations from the centerline (Shewhart style), the trial control

limits for the proportion defective in any particular sample are:

UCL=p+3√ p(1− p )n

CL=p

LCL= p−3√ p(1−p )n

These trial control limits would be plotted along with the individual, time-ordered sample data,

and then checked to be sure that all samples were within the control limits. If not, we would

investigate the out-of-control points, remove the special cause (if found) and recalculate the

trial limits without any out-of-control samples in the data. Then we would use those control

limits for production monitoring purposes.

If the control limits were to be calculated from a standard value (prior history) for the

proportion defective, the formulation is similar (we replace the sample parameters with the

standard population values):

UCL=p+3√ p(1−p )n

CL=p

LCL=p−3√ p(1−p )n

If we desired control limits at a different point (either further out from, or closer into the mean)

we could replace the constant 3 with a different value (2 for 2 limits, 6 for 6 limits…). Note

also that we should pick our sample size so that there is a high probability of finding at least one

defect in a sample – otherwise, we would effectively accept a zero-defects sample (rendering

our lower control limit useless in detecting important shifts in our process).

In practice, however, the sample size for a p-chart does not have to be held constant. Usually,

we would estimate the mean sample size (n ) and substitute it for the fixed sample size (n) in

the above equations. The computation for the mean sample size from m samples of differing

sizes is found by:

n=∑i=1

m

ni

m

A more exact alternative would be to compute variable width control limits that change with

the individual sample size. If we started with m samples (20 ≤ m ≤ 25) of individual size ni, then

we would estimate the centerline (once) at p from:

p=∑i=1

m

Di

∑i=1

m

ni

And then we could have the control limits vary in width about this centerline as the sample size

changed, using the limits:

UCL=p+3√ p(1− p )ni

CL=p

LCL= p−3√ p(1−p )ni

NOTE: When using variable width control limits, it is not possible to utilize rules for detecting

runs. In general, run rules are never used with p-charts. The lack of a strong statistical basis for

these run rules is one of the reasons that continuous variable control charts are preferred to

attributes charts – there is simply more information available from the continuous variable than

from the discrete variable.

NP-Charts

The np-Chart is used to track the number of defective units in a sample of units (rather than the

proportion of defective units). Like the p-chart, this chart is derived from the Binomial

Distribution. However, the np-Chart always requires a fixed sample size. Calculating the

control limits from sample data leads to:

UCL=n p+3√n p(1−p )CL=n pLCL=n p−3√n p(1−p )

And if there was a historical standard for estimating np, then the control limits become:

UCL=np+3√np(1− p )CL=npLCL=np−3√np(1− p )

For an np-Chart, the control limits are constant (until we improve the process and recalculate

tighter control limits). In this case, as long as we have a sample of inspection units that has a

high probability of having at least one defective unit in each sample, we can utilize the run rules

without violating assumptions too much. This gives us a slightly more powerful control chart

than the p-chart, at the cost of inspecting a slightly larger sample of the units.

C-Charts

Sometimes the presence of a defect does not “ruin” the product, even if defects are

undesirable. For example, a farmer might still buy a tractor even with a few scratches in the

paint on one fender, or a computer programmer might still accept an LCD monitor with one or

two defective pixels. However, a good manufacturer would still wish to track the number of

defects occurring in each product in order to improve and continue to compete. C- and u-

Charts work to track the number of defects that occur as a product is created.

The c-chart is derived from the Poisson Distribution, which assumes that the opportunities for

defects to occur is essentially infinite (ex.: small defects occurring within a large area). If x is a

given number of defects, then the probability of observing x defects in an inspection unit is:

p( x )= e−c c x

x ! where c is the true mean count of the number of occurrences per unit

For the Poisson distribution, the mean and the variance are the same, and both are equal to c.

This information can be used to set up an approximate Normal hypothesis test (but it is quicker

to just cut to the derivation of the control charts limits!).

The mean count of defects occurring per inspection unit is best estimated by counting the total

number of defects occurring over a large number of inspection units:

c= total number of defects

total inspection units

This parameter will represent our center line, but we will also need upper and lower control

limits. If we are working to establish control limits from sample data, the formulation would

be:

UCL=c+3√cCL=cLCL=c−3√c or 0 if LCL is negative

Alternatively, if we are continuing to use an existing and stable process, the “standard” value of

c could be used for the control limits by:

UCL=c+3√cCL=cLCL=c−3√c or 0 if LCL is negative

In all cases of the c-Chart, the inspection unit is a constant size. Provided that the LCL is greater

than zero, then we will have constant control limits and we can apply the rules for detecting

runs in addition to the out-of-control point criteria to determine if our process is stable and in-

control.

U-Charts

These charts are used when the size of the inspection unit may vary. (In fact, the size might not

even be an integer multiple of the inspection units!) Assuming that we have to generate the

control limits from a pool of 20-25 samples, our best estimate for the center line for the u-chart

is:

u= total number of defects

total units inspected

One option for the u-chart is to use the mean sample size in computing the upper and lower

control limits. The mean sample size and the control limits are computed from:

n=∑i=1

m

ni

m

UCL=u+3√unCL=u

LCL=u−3√unAnother, more exact alternative is to use variable control limits. Similar to the variable limit p-

chart, we would compute our centerline once from our sample data, and then use it to change

the limits with each sample. From this point, we can compute our control limits for each

individual sample size (ni) by:

UCL=u+3√uniCL=u

LCL=u−3√uniAs with the other variable limit control chart, the ability to use run tests is forfeited.

Additionally, if the defects occur in clusters (ie. the presence of one defect makes it more likely

for another defect to occur), then the defects do not follow a Poisson Distribution and the

control limits will not be very precise. In some instances, mixtures of defect types can

sometimes cause clustering.

In some cases, when the defect rates are in the low parts-per-million range, the size of the

inspection unit will grow very large. U-Charts can also be used if the plotted variable is changed

to be the time-between-successive-defects, with much lower inspection frequency/cost.

Control chart for variables

Variables are the measurable characteristics of a product or service. Measurement data

is taken and arrayed on charts. The types of charts are often classified according to the type of

quality characteristic that they are supposed to monitor: there are quality control charts

for variables and control charts for attributes. Specifically, the following charts are commonly

constructed for controlling variables

1) X-bar chart

In this chart the sample means are plotted in order to control the mean value of a

variable (e.g., size of piston rings, strength of materials, etc.). The charts' x-axes are time based,

so that the charts show a history of the process. For this reason, data should be time-ordered;

that is, entered in the sequence from which it was generated. If this is not the case, then trends

or shifts in the process may not be detected, but instead attributed to random (common cause)

variation. For subgroup sizes greater than ten, use X-bar / Sigma charts, since the range

statistic is a poor estimator of process sigma for large subgroups

2) R chart

In this chart, the sample ranges are plotted in order to control the variability of a variable.

3) S chart

In this chart, the sample standard deviations are plotted in order to control the variability of

a variable. For sample size (n>10), the S-chart is more efficient than R-chart. For situations

where sample size exceeds 10, the X-bar chart and the S-chart should be used.

4) S2 chart

In this chart, the sample variances are plotted in order to control the variability of a variable

5) X-bar and R charts

An Xbar-R chart plots the process mean (Xbar chart) and process range (R chart) over

time for variables data in subgroups. This combination control chart is widely used to examine

the stability of processes in many industries.

For example, Xbar-R can be use to monitor the process mean and variation for

subgroups of part lengths, call times, or hospital patients' blood pressure over time.

The Xbar chart and the R chart are displayed together because both charts can

determine whether your process is stable. Examine the R chart first because the process

variation must be in control to correctly interpret the Xbar chart. The control limits of the Xbar

chart are calculated considering both process spread and center. If the R chart is out of control,

then the control limits on the Xbar chart may be inaccurate and may falsely indicate an out-of-

control condition or fail to detect one.

6) X-bar and S charts

An Xbar-S chart plots the process mean (Xbar chart) and process standard deviation

(S chart) over time for variables data in subgroups. This combination control chart is widely

used to examine the stability of processes in many industries.

The X-s chart is very similar to the X-R chart. The major difference is that the

subgroup standard deviation is plotted when using the X-s chart, while the subgroup

range is plotted when using the X-R chart. One advantage of using the standard deviation

instead of the range is that the standard deviation takes into account all the data, not just

the maximum and the minimum.

The figure below is the X chart. The X values are plotted on this chart. Three lines

are plotted on the chart. The middle line is the overall process averag; the upper line is the

upper control limit; and the lower line is the lower control limit.

CONCLUSIONS

In general, continuous variable control charts will detect smaller changes earlier than an

attribute control charts can. The Central Limit Theorem can be used to justify an approximation

of attribute data with control charts based on the Normal Distribution. Finally, continuous

variable control charts normally require much smaller sample sizes as well.

However, attribute control charts can cover several defect types on one chart, where two

charts (x-bar and R- or -Charts are required for each single characteristic to be measured. And

continuous variables generally require more refined equipment and time to complete the

measurement, leading to a higher inspection cost.

REFERENCES1) http://www.dummies.com/how-to/content/statistics-discrete-and-continuous-random-

variable.html

2) http://stattrek.com/probability-distributions/discrete-continuous.aspx?Tutorial=Stat

3) http://www.henry.k12.ga.us/ugh/apstat/chapternotes/7supplement.html

4) http://www.statisticshowto.com/discrete-vs-continuous-variables/

http://www.henry.k12.ga.us/ugh/apstat/chapternotes/7supplement.html

http://stattrek.com/probability-distributions/discrete-continuous.aspx?Tutorial=Stat

http://www.dummies.com/how-to/content/statistics-discrete-and-continuous-random-%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20variable.html

http://www.dummies.com/how-to/content/statistics-discrete-and-continuous-random-%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20variable.html

discrete and random variables

Documents