cfa level 1 quantitative analysis e book - part 4(1)
TRANSCRIPT
-
8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)
1/40
CFAQ
uantitativeAna
lysisE-Book4of8
www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM
Quantitative Analysis E-Book
Part 4 of 8
-
8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)
2/40
CFAQ
uantitativeAna
lysisE-Book4of8
www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM
Sampling and Estimation
-
8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)
3/40
CFAQ
uantitativeAna
lysisE-Book4of8
www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM
1. Introduction.
In investment analysis, it is often impossible to study every member of the population. Even if analysts
could examine the entire population, it may not be economically efficient to do so. Sampling is the processof obtaining a sample. A simple random sample is a sample obtained in such a way that each element of the
population has an equal probability of being selected. The selection of any one element has no impact on
the chance of selecting another element.
A sample is random if the method for obtaining the sample meets the criterion of randomness (each
element having an equal chance at each draw). The word 'simple' tells you that the process is not difficult,
and the word 'random' tells you that you don't know in advance which observations will be selected in thesample. The actual composition of the sample itself does not determine whether or not it's a random
sample.
Example
Suppose that a company has 30 directors, and you wish to choose 10 of them to serve on a committee. You
could place the names of the 30 directors on separate pieces of paper, and draw them out one by one, untilyou have drawn a sample of size 10.
Note: that the conditions for simple random sampling have been satisfied in that every one of the 30
directors has an equal (non-zero) chance of being selected in the sample.
-
8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)
4/40
CFAQ
uantitativeAna
lysisE-Book4of8
www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM
In this example, it makes no sense to sample with replacement, as this would mean that once you have
drawn a name, that name goes back into the hat (i.e. it is replaced), and can be drawn again. If the samepersonsname is drawn more than once, then you won't end up with a sample of size 10 if you draw 10
names, so this experiment should be done without replacement.
A biased sample is one in which the method used to create the sample results in samples that are
systematically different from the population. For instance, consider a research project on attitudes toward
cricket. Collecting the data by publishing a questionnaire in a magazine and asking people to fill it out and
send it in would produce a biased sample. People interested enough to spend their time and energy fillingout and sending in the questionnaire are likely to have different attitudes toward cricket than those not
taking the time to fill out the questionnaire.
It is important to realize that it is the method used to create the sample not the actual make up of the sample
itself that defines the bias. A random sample that is very different from the population is not biased: it is by
definition not systematically different from the population. It is randomly different.
-
8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)
5/40
CFAQ
uantitativeAna
lysisE-Book4of8
www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM
SAMPLING ERROR
The sample taken from a population is used to infer conclusions about the population. However, it's
unlikely that the sample statistic would be identical to the population parameter. Suppose there is a class of
100 students, and a sample of size 10 from that class is chosen. If by chance most of the brightest students
in this sample are selected, then there is a misguided idea of what the population looks like, because the
sample mean x-bar will be much higher than the population mean in this case. Equally, a sample
comprising mainly weaker students could be chosen, and then the opposite would have applied. he ideal is
to have a sample, which comprises a few bright students, a few weaker students, and mainly averagestudents, as this will give a good idea of the composition of population. However, because which items go
into the sample cannot be controlled, you are dependent to some degree on chance as to whether the results
are favorable or not.
Sampling error (also called error of estimation) is the difference between the observed value of a statistic
and the quantity it is intended to estimate. For example, sampling error of the mean equals sample mean
minus population mean.
-
8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)
6/40
CFAQ
uantitativeAna
lysisE-Book4of8
www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM
Sampling error can apply to statistics such as the mean, the variance, the standard deviation or any other
values that can be obtained from the sample. The sampling error varies from sample to sample. A goodestimator is one whose sample error distribution is highly concentrated about the population parameter
value.
Sampling error of the mean would be: Sample mean - population mean = x-bar
Sampling error of the standard deviation would be: Sample standard deviation - population standard
deviation = s - .
-
8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)
7/40
CFAQ
uantitativeAna
lysisE-Book4of8
www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM
Sampling distribution
A sample statistic itself is a random variable, which varies depending upon the composition of the sample.
It therefore has a probability distribution. The sampling distribution of a statistic is the distribution of all
the distinct possible values that the statistic can assume when computed from samples of the same size
randomly drawn from the same population. The most commonly used sample statistics include mean,
Variance and standard deviation.
If you compute the mean of a sample of 10 numbers, the value you obtain will not equal the populationmean exactly; by chance it will be a little bit higher or a little bit lower. If you sampled sets of 10 numbers
over and over again (computing the mean for each set), you would find that some sample means come
much closer to the population mean than others. Some would be higher than the populations mean and
some would be lower. Imagine sampling 10 numbers and computing the mean over and over again, say
about 1,000 times, and then constructing a relative frequency distribution of those 1,000 means. This
distribution of means is a very good approximation to the sampling distribution of the mean. The sampling
distribution of the mean is a theoretical distribution that is approached as the number of samples in therelative frequency distribution increases. With 1,000 samples, the relative frequency distribution is quite
close; with 10,000 it is even closer. As the number of samples approaches infinity, the relative frequency
distribution approaches the sampling distribution.
-
8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)
8/40
CFAQ
uantitativeAna
lysisE-Book4of8
www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM
The sampling distribution of the mean for a sample size of 10 is just an example; there is a different
sampling distribution for other sample sizes. Also, keep in mind that the relative frequency distributionapproaches a sampling distribution as the number of samples increases, not as the sample size increases
since there is a different sampling distribution for each sample size.
A sampling distribution can also be defined as the relative frequency distribution that would be obtained if
all possible samples of a particular sample size were taken. For example, the sampling distribution of the
mean for a sample size of 10 would be constructed by computing the mean for each of the possible ways in
which 10 scores could be sampled from the population and creating a relative frequency distribution ofthese means. Although these two definitions may seem different, they are actually the same: Both
procedures produce exactly the same sampling distribution.
Statistics other than the mean have sampling distributions too. The sampling distribution of the median is
the distribution that would result if the median instead of the mean were computed in each sample.
Sampling distributions are very important since almost all inferential statistics are based on samplingdistributions.
-
8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)
9/40
CFAQ
uantitativeAna
lysisE-Book4of8
www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM
Simple random vs. stratified random sampling
In stratified random sampling, the population is subdivided into subpopulations (strata) based on one or
more classification criteria. Simple random samples are then drawn from each stratum (The sizes of the
samples are proportional to the relative size of each stratum in the population). These samples are then
pooled.
It is important to note that the size of the data in each stratum does not have to be the same or even similar,
and frequently isn't.
Stratified random sampling guarantees that population subdivisions of interest are represented in the
sample. The estimates of parameters produced from stratified sampling have greater precision (i.e. smaller
variance or dispersion) than estimates obtained from simple random sampling.
-
8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)
10/40
CFAQ
uantitativeAna
lysisE-Book4of8
www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM
For example, investors may want to fully duplicate a bond index by owning all the bonds in the index in
proportion to their market value weights. This is known as pure bond indexing. However, it's difficult andcostly to implement because a bond index typically consists of thousands of issues. If simple sampling is
used, the sample selected may not accurately reflect the risk factors of the index. Stratified random
sampling can be used to replicate the bond index.
Divide the population of index bonds into groups with similar risk factors (e.g. issuer, duration/maturity,
coupon rate, credit rating, call exposure, etc.). Each group is called a stratum or cell.
Select a sample from each cell proportional to the relative market weighting of the cell in the index.
A stratified sample will ensure that at least one issue in each cell is included in the sample.
-
8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)
11/40
CFAQ
uantitativeAna
lysisE-Book4of8
www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM
Time-series and cross-sectional data.
Data come in many different shapes and sizes, and measure many different things at different times. Often
financial analysts are interested in particular types of data such as time-series data or cross-sectional data.
Time-series data is a set of observations collected at usually discrete and equally spaced time intervals. For
example, the daily closing price of a certain stock recorded over the last six weeks is an example of time
series data. Note that a too long or too short time period may lead to time-period bias. Refer to subject g for
details.
Other examples of time-series would be staff numbers at a particular institution taken on a monthly basis in
order to assess staff turnover rates, weekly sales figures of ice-cream sold during a holiday period at a
seaside resort and the number of students registered for a particular course on a yearly basis. All of the
above would be used to forecast likely data patterns in the future.
-
8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)
12/40
CFAQ
uantitativeAna
lysisE-Book4of8
www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM
Cross-sectional data are observations that coming from different individuals or groups at a single point in
time. For example, if one considered the closing prices of a group of 20 different tech stocks on December15, 1986 this would be an example of cross-sectional data. Note that the underlying population should
consist of members with similar characteristics. For example, suppose you are interested in how much
companies spend on research and development expenses. Firms in some industries such as retail spend
little on research and development (R&D), while firms in industries such as technology spend heavily on
R&D. Therefore, it's inappropriate to summarize R&D data across all companies. Rather, analysts should
summarize R&D data by industry, and then analyze the data in each industry group.
Other examples of cross-sectional data would be: an inventory of all ice creams in stock at a particular
store, a list of grades obtained by a class of students for a specific test.
-
8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)
13/40
CFAQ
uantitativeAna
lysisE-Book4of8
www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM
2. The Central Limit Theorem.
The central limit theorem states that given a distribution with a mean and variance 2, the sampling
distribution of the mean x-bar approaches a normal distribution with a mean ()and a variance 2/N as N,the sample size, increases.
The amazing and counter-intuitive thing about the central limit theorem is that no matter what the shape of
the original distribution, x-bar approaches a normal distribution.
If the original variable X has a normal distribution, then x-bar will be normal regardless of the
sample size.
If the original variable X does not have a normal distribution, then x-bar will be normal only if N
30. This is called a distribution free result. This means that no matter what distribution X has,
will still be normal for sufficiently large n.
Keep in mind that N is the sample size for each mean and not the number of samples. Remember in a
sampling distribution the number of samples is assumed to be infinite. The sample size is the number of
scores in each sample; it is the number of scores that goes into the computation of each mean.
-
8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)
14/40
CFAQ
uantitativeAna
lysisE-Book4of8
www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM
Two things should be noted about the effect of increasing N:
1. The distributions become more and more normal.
2. The spread of the distributions decreases.
Based on the central limit theorem, when the sample size is large, you can:
1. Use the sample mean to infer the population mean.2. Construct confidence intervals for the population mean based on the normal distribution.
Note that the central limit theorem does not prescribe that the underlying population must be normally
distributed. Therefore, the central limit theorem can be applied on a population with any probability
distribution.
-
8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)
15/40
CFAQ
uantitativeAna
lysisE-Book4of8
www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM
3. Standard Error of the Sample Mean.
The standard error of a statistic is the standard deviation of the sampling distribution of that statistic.
Standard errors are important because they reflect how much sampling fluctuation a statistic will show. The
inferential statistics involved in the construction of confidence intervals and significance testing are based
on standard errors. The standard error of a statistic depends on the sample size. In general, the larger the
sample size, the smaller the standard error. The standard error of a statistic is usually designated by the
Greek letter sigma ()with a subscript indicating the statistic.
The standard error of the mean is designated as: m. It is the standard deviation of the samplingdistribution of the mean. The formula for the standard error of the mean is: m= /N1/2, where is the
standard deviation of the original distribution and N is the sample size (the number of scores each mean is
based upon). This formula does not assume a normal distribution. However, many of the uses of the
formula do assume a normal distribution. The formula shows that the larger the sample size, the smaller the
standard error of the mean. More specifically, the size of the standard error of the mean is inversely
proportional to the square root of the sample size
-
8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)
16/40
-
8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)
17/40
CFAQ
uantitativeAna
lysisE-Book4of8
www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM
Example 2
Suppose that the mean grade of students in a class is unknown, but a sample of 30 students is taken from
the class, and the mean from the sample is found to be 60%, with a standard deviation of 9%. Calculate the
standard error of the sample mean, and interpret your results.
Now, and are unknown, but m is given as 60 and s Now, and are unknown, but m is given as 60 and
s is given as 9. Since n = 30, you can estimate the standard error of the sample mean as: 9/301/2 = 1.6432.
This means that if you took all possible samples of size 30 from the class, you would estimate the standarderror to be 1.6432.
It is important to note that when you have , you must use it; but when you don't, you use its sample
equivalent s.
-
8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)
18/40
CFAQ
uantitativeAna
lysisE-Book4of8
www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM
4. Estimators.
Very often, there are a number of different estimators that can be used to estimate unknown population
parameters. When faced with such a choice, it is desirable to know that the estimator chosen is the "best"under the circumstances, that is, it has more desirable properties than any of the other options available to
us. There are three desirable properties of estimators:
1. Unbiasedness An estimator's expected value (the mean of its sampling distribution) equals the
parameter it is intended to estimate. For example, the sample mean is an unbiased estimator of the
population mean, because the expected value of the sample mean is equal to the population mean.
2. Efficiency An estimator is efficient if no other unbiased estimator of the sample parameter has asampling distribution with smaller variance. That is, in repeated samples, analysts expect the
estimates from an efficient estimator to be more tightly grouped around the mean than estimates
from other unbiased estimators. For example, the sample mean is an efficient estimator of the
population mean, and the sample variance is an efficient estimator of the population variance.
3. Consistency A consistent estimator is one for which the probability of accurate estimates
(estimates close to the value of the population parameter) increases as sample size increases. Inother words, a consistent estimator's sampling distribution becomes concentrated on the value of
the parameter it is intended to estimate as the sample size approaches infinity. For example, as the
sample size increases to infinity, the standard error of the sample mean declines to 0, and the
sampling distribution concentrates around the population mean. Therefore, the sample mean is a
consistent estimator of the population mean.
-
8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)
19/40
CFAQ
uantitativeAna
lysisE-Book4of8
www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM
The single estimate of an unknown population parameter calculated as a sample mean is called point
estimate of the mean. The formula used to compute the point estimate is called an estimator. The specificvalue calculated from sample observations using an estimator is called an estimate. For example, the
sample mean is a point estimate of the population mean. Suppose two samples are taken from a population,
and the sample means are 16 and 21 respectively. Therefore, 16 and 21 are two estimates of the population
mean. Note that an estimator will yield different estimates as repeated samples are taken from the sample
population.
A confidence interval is an interval for which one can assert with a given probability 1 - , called thedegree of confidence, that it will contain the parameter it is intended to estimate. This interval is often
referred to as the (1 - )% confidence interval for the parameter, where is referred to as the level of
significance. The end points of a confidence interval are called the lower and upper confidence limits.
For example, suppose that a 95% confidence interval for the population mean is 20 to 40. This means that
There is a 95% probability that the population mean lies in the range of 20 to 40;
"95%" is the degree of confidence; "5%" is the level of significance;
20 and 40 are the lower and higher confidence limits, respectively.
-
8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)
20/40
CFAQ
uantitativeAna
lysisE-Book4of8
www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM
5. Confidence Intervals for the Population Mean.
Confidence intervals are typically constructed by using the following structure:
Confidence Interval = Point Estimate Reliability Factor x Standard Error
Point estimate is the value of a sample statistic of the population parameter.
Reliability factor is a number based on the sampling distribution of the point estimate and the
degree of confidence (1 - ).
Standard error refers to the standard error of the sample statistic that is used to produce the point
estimate.
Whatever the distribution of the population, the sample mean is always the point estimate used to construct
the confidence intervals for the population mean. The reliability factor and the standard error, however,
may vary depending on three factors:
1. Distribution of population: normal or non-normal.
2. Population variance: known or unknown.
3. Sample size: large or small.
-
8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)
21/40
CFAQ
uantitativeAna
lysisE-Book4of8
www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM
z-Statistic: a standard normal random variable
If a population is normally distributed with a knownvariance, z-statistic is used as the reliability factor to
construct confidence intervals for the population mean.
In practice, the population standard deviation is rarely known. However, learning how to compute a
confidenceinterval when the standard deviation is known is anexcellent introduction to how to compute a
confidence interval when the standard deviation has to beestimated.
Three values are used to construct a confidence interval for :
1. The sample mean (m);
2. The value of z (which depends on the level of confidence), and
3. The standard error of the mean ()m.
The confidence interval has m for its center and extends a distance equal to the product of z and in bothdirections. Therefore, the formula for a confidence interval is:
m - z m = = m + z m
-
8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)
22/40
CFAQ
uantitativeAna
lysisE-Book4of8
www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM
For a (1 - )% confidence interval for the population mean, the z-statistic to be used is Z /2. Z /2 denotes
the points of the standard normal distribution such that /2 of the probability falls in the right-hand tail.
Effectively, what is happening is that the (1 - )% of the area that makes up the confidence interval falls in
the center of the graph, that is, symmetrically around the mean. This leaves % of the area in both tails, or
/2 % of area in each tail.
Commonly used reliability factors are as follows:
90% confidence intervals: z0.05= 1.645. is 10%, with 5% in each tail.
95% confidence intervals: z0.025= 1.96. is 5%, with 2.5% in each tail.
99% confidence intervals: z0.005= 2.575. is 1%, with 0.5% in each tail.
-
8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)
23/40
CFAQ
uantitativeAna
lysisE-Book4o
f8
www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM
Example
Assume that the standard deviation of SAT verbal scores in a school system is known to be 100. A
researcher wishes to estimate the mean SAT score and compute a 95% confidence interval from a random
sample of 10 scores.
The 10 scores are: 320, 380, 400, 420, 500, 520, 600, 660, 720, and 780. Therefore, m = 530, N = 10, and
m= 100 / 101/2 = 31.62. The value of z for the 95% confidence interval is the number of standard
deviations one must go from the mean (in both directions) to contain .95 of the scores.
It turns out that one must go 1.96 standard deviations from the mean in both directions to contain .95 of the
scores the value of 1.96 was found using a z table. Since each tail is to contain .025 of the scores, you find
the value of z for which 1 - 0.025 = 0.975 of the scores are below. This value is 1.96.
-
8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)
24/40
CFAQ
uantitativeAna
lysisE-Book4o
f8
www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM
All the components of the confidence interval are now known: m = 530, m= 31.62, z = 1.96.
Lower limit = 530 - (1.96)(31.62) = 468.02
Upper limit = 530 + (1.96)(31.62) = 591.98
Therefore, 468.02 591.98. This means that the experimenter can be 95% certain that the mean SAT in
the school system is between 468 and 592. This also means if the experimenter repeatedly took samples
from the population and calculated a number of different 95% confidence intervals using the sample
information, on average 95% of those intervals would contain . Notice that this is a rather large range of
scores. Naturally, if a larger sample size had been used, the range of scores would have been smaller.
-
8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)
25/40
CFAQ
uantitativeAna
lysisE-Book4o
f8
www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM
The computation of the 99% confidence interval is exactly the same except that 2.58 rather than 1.96 is
used for z. The 99% confidence interval is: 448.54 = = 611.46. As it must be, the 99% confidence intervalis even wider than the 95% confidence interval.
Summary of Computations
1. Compute m = X/N.
2. Compute m= /N1/2
3. Find z (1.96 for 95% interval; 2.58 for 99% interval)
4. Lower limit = m - z m
5. Upper limit = m + z m
6. Lower limit = = Upper limit
Assumptions:
1. Normal distribution
2. is known
3. Scores are sampled randomly and are independent
-
8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)
26/40
CFAQ
uantitativeAna
lysisE-Book4o
f8
www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM
There are three other points worth mentioning here:
The point estimate will always lie exactly at the midway mark of the confidence interval. This is because it
is the "best" estimate for ,and so the confidence interval expands out from it in both directions.
The higher the percentage of confidence, the wider the interval will be. This is because as the percentage is
increased, a wider interval is needed to give us a greater chance of capturing the unknown population value
within that interval.
The width of the confidence interval is always twice the part after the positive or negative sign, that is,
twice the reliability factor x standard error. The width is simply the upper limit minus the lower limit.
It is very rare for a researcher wishing to estimate the mean of a population to already know its standard
deviation. Therefore, the construction of a confidence interval almost always involves the estimation of
both and .
-
8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)
27/40
CFAQ
uantitativeAna
lysisE-Book4o
f8
www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM
STUDENTS' T-DISTRIBUTION
When is known, the formula m - z m= = m + z mis used for a confidence interval. When is not
known, m = s/N1/2 (N is the sample size) is used as an estimate of and . Whenever the standard
deviation is estimated, the t rather than the normal (z) distribution should be used. The values of t are larger
than the values of z so confidence intervals when is estimated are wider than confidence intervals when
is known. The formula for a confidence interval for when is estimated is:
m - t sm= = m + t sm
Where m is the sample mean, sm is an estimate of m, and t depends on the degrees of freedom and the
level of confidence.
-
8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)
28/40
CFAQ
uantitativeAna
lysisE-Book4o
f8
www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM
The t-distribution is a symmetrical probability distribution defined by a single parameter known as degrees
of freedom (df). Each value for the number of degrees of freedom defines one distribution in this family ofdistributions. Like a standard normal distribution (e.g. a z-distribution), the t-distribution is symmetrical
around its mean. Unlike a standard normal distribution, the t-distribution has the following unique
characteristics.
It is an estimated standardized normal distribution. When n gets larger, t approximates z (s
approaches ).
The mean is 0, and the distribution is bell shaped. There is not one t-distribution, but a family of t-distributions. All t-distributions have the same
mean of 0. Standard deviations of these t-distributions differ according to the sample size, n.
The shape depends on degrees of freedom (n - 1). The t-distribution is less peaked than a standard
normal distribution, and has fatter tails (i.e. more probability in the tails).
t/2 tends to be greater than z/2for a given level of significance, .
Its variance is v/(v-2) (for v > 2), where v = n-1. It is always bigger than 1. As v increases, the
variance approaches 1.
-
8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)
29/40
CFAQ
uantitativeAna
lysisE-Book4o
f8
www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM
The value of t can be determined from a t table. The degrees of freedom for t is equal to the degrees offreedom for the estimate of mwhich is equal to N-1.
-
8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)
30/40
CFAQ
uantitativeAna
lysisE-Book4o
f8
www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM
A portion of t-table is presented as below:
Level of significance (a) for two-Tailed Test
Suppose the sample size (n) is 30, and the level of significance () is 5%. df = n - 1 = 29. t/2= t0.025=
2.045 (Find the 29 df row, and then move to the 0.05 column).
Cff 0.20 0.10 0.05 0.02 0.01
1 3.078 6.314 12.706 31.821 63.657
2 1.886 2.920 4.303 6.965 9.925
29 1.311 1.699 2.045 2.462 2.756
30 1.310 1.697 2.042 2.457 2.750
-
8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)
31/40
CFAQ
uantitativeAna
lysisE-Book4o
f8
www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM
Example
Assume a researcher is interested in estimating the mean reading speed (number of words per minute) of
high-school graduates and computing the 95% confidence interval. A sample of 6 graduates was taken and
the reading speeds were: 200, 240, 300, 410, 450, and 600. For these data,
m = 366.6667
sm= 60.9736
df = 6-1 = 5
t = 2.571
Therefore, the lower limit is: m - (t) (sm) = 209.904 and the upper limit is: m + (t) (sm) = 523.430.
Therefore, the 95% confidence interval is: 209.904 = = 523.430
Thus, the researcher can be 95% sure that the mean reading speed of high-school graduates is between209.904 and 523.430.
-
8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)
32/40
CFAQ
uantitativeAna
lysisE-Book4o
f8
www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM
Summary of Computations
1. Compute m = X/N.2. Compute s
3. Compute m= s/N1/2
4. Compute df = N-1
5. Find t for these df using a t table
6. Lower limit = m - t s m
7. Upper limit = m + t sm
8. Lower limit = = Upper limit
Assumptions:
1. Normal distribution
2. Scores are sampled randomly and are independent
-
8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)
33/40
CFAQ
uantitativeAnalysisE-Book4o
f8
www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM
Discuss the issues surrounding selection of the appropriate sample size
It's all starting to become a little confusing. Which distribution do you use?
When a large sample size (generally bigger than 30 samples) is used, a z table can always be used to
construct the confidence interval. It does not matter if the population distribution is normal, or if the
population variance is known or not. This is because the central limit theorem assures that when the sample
is large, the distribution of the sample mean is approximately normal. However, the t-statistic is more
conservative because the t-statistic tends to be greater than the z statistic, and therefore using t-statistic willresult in a wider confidence interval.
However, if there is only a small sample size, a t table has to be used to construct the confidence interval
when the population distribution is normal and the population variance is not known.
-
8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)
34/40
CFAQ
uantitativeAnalysisE-Book4o
f8
www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM
If the population distribution is not normal, there is no way to construct a confidence interval from a small
sample (even if the population variance is known).
Therefore, all else equal, you should try to select a sample larger than 30. The larger the sample size, the
more precise the confidence interval.
In general, at least one of the following is needed:
A normal distribution for the population.
A sample size that is greater than or equal to 30.
If one or both of the above occur, then a z-table or t-table is used, dependent upon whether is known or
unknown. If neither of the above occurs, then the question cannot be answered.
-
8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)
35/40
CFAQ
uantitativeAnalysisE-Book4o
f8
www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM
A summary of the situation is as follows:
If the population is normally distributed, and the population variance is known, use a z-score
irrespective of sample size.
If the population is normally distributed, and the population variance is unknown, use a t-score
irrespective of sample size.
If the population is not normally distributed, and the population variance is known, use a z score
only if n >= 30, otherwise it cannot be done.
If the population is not normally distributed, and the population variance is unknown, use a t-
score only if n >= 30, otherwise it cannot be done.
-
8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)
36/40
CFAQ
uantitativeAnalysisE-Book4o
f8
www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM
6. Common biases in sampling methods.
As has already been mentioned repeatedly, if there are problems with the choice of sample, then the
conclusions that are drawn from a sample could be in error.
There are a number of different types of bias that can creep into samples. It is important to be aware of
them, and have the ability to comment on their possible appearance in the data where appropriate.
Data-snooping bias is the bias in the inference drawn as a result of prying into the empirical results of
others to guide your own analysis.
Finding seemingly significant but in fact spurious patterns in the data is a serious problem in financial
analysis. Although it afflicts all non-experimental sciences, data-snooping is particularly problematic for
financial analysis because of the large number of empirical studies performed on the same datasets. Given
enough time, enough attempts, and enough imagination, almost any pattern can be teased out of any
dataset. In some cases, these spurious patterns are statistically small, almost unnoticeable in isolation. But
because small effects in financial calculations can often lead to very large differences in investment
performance, data-snooping biases can be surprisingly substantial.
-
8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)
37/40
CFAQ
uantitativeAnalysisE-Book4o
f8
www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM
For example, after examining the empirical evidence from 1986 to 2002, Professor Minard concludes that a
growth investment strategy produces superior investment performance. After reading about ProfessorMinard's study, Monica decides to conduct a research of growth versus value investing based on the same
or related historical data used by Professor Minard. Monica's research is subject to data-snooping bias
because, among other things, the data used by Professor Minard may be spurious.
The best way to avoid data-snooping bias is to examine new data. However, data-snooping bias is difficult
to avoid because investment analysis is typically based on historical or hypothesized data.
Data-snooping bias can easily lead to data-mining bias.
Data-mining is the practice of finding forecasting models by extensive searching through databases for
patterns or trading rules (i.e. repeatedly "drilling" in the same data until you find something). It has a very
specific definition: continually mixing and matching the elements of a database until one "discovers" two
more or more data series that are highly correlated. Data-mining also refers more generically to any of a
number of practices in which data can be tortured into confessing anything.
-
8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)
38/40
CFAQ
uantitativeAnalysisE-Book4o
f8
www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM
Two signs may indicate the existence of data-mining in research findings about profitable trading
strategies:
1. Many of the variables actually used in the research are not reported. These terms may indicate
that the researchers were searching through many unreported variables.
2. There is no plausible economic theory available to explain why those strategies work.
To avoid data-mining, analysts should use out-of sample data to test a potentially profitable trading rule.
That is, analysts should test the trading rule on a data set other than the one used to establish the rule.
Sample selection bias occurs when data availability leads to certain assets being excluded from the
analysis. The discrete choice has become a popular tool for assessing the value of non-market goods.
Surveys used in these studies frequently suffer from large non-response which can lead to significant bias
in parameter estimates and in the estimate of mean
-
8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)
39/40
CFAQ
uantitativeAnalysisE-Book4o
f8
www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM
Survivorship bias is the most common type of sample selection bias. It occurs when studies are conducted on
databases that have eliminated all companies that have ceased to exist (often due to bankruptcy). The findings
from such studies most likely will be upwardly biased, since the surviving companies will look better than
those that no longer exist For example many mutual those that no longer exist. For example, many mutual fund
databases provide historical data about only those funds that are currently in existence. As a result, funds that
have ceased to exist due to closure or merger do not appear in these databases. Generally, funds that have
ceased to exist have lower returns relative to the surviving funds. Therefore, the analysis of a mutual fund
database with survivorship bias will overestimate the average mutual fund return because the database only
includes the better-performing funds. Another example is the return data on stocks listed on an exchange as it is
subject to survivorship bias: it's difficult to collect information on delisted companies and these companiesoften have poor performance.
Look-ahead bias exists when studies assume that fundamental information is available when it is not. For
example, researchers often assume a person had annual earnings data in January; in reality the data might not
be available until March. This usually biases results upwards.
Time period bias occurs when a test design is based on a time period that may make the results time-periodspecific. Even the worst performers have months or even years in which they look wonderful. After all, stopped
clocks are right twice a day. To eliminate strategies that have just been lucky, research must encompass many
years. However, if the time period is too long, the fundamental economic structure may have changed during
the time frame resulting in two data changed during the time frame, resulting in two data sets that reflect
different relationships.
-
8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)
40/40
CFAQ
uantitativeAnalysisE-Book4o
f8 For FREE Resources
https://www.educorporatebridge.com/freebies.php
Corporate Bridge Blog
Finance News, Articles, Interview Tips etc
https://www.educorporatebridge.com/blog
For Online Finance Courses
For any other enquiry / information
Email [email protected]
https://www.educoporatebridge.com
Disclaimer Please refer to the updated curriculum of CFA level 1 for further information
https://www.educorporatebridge.com/freebies.phphttps://www.educorporatebridge.com/blogmailto:[email protected]://www.educorporatebridge.com/https://www.educorporatebridge.com/mailto:[email protected]://www.educorporatebridge.com/bloghttps://www.educorporatebridge.com/freebies.php