types of data - oats and sugar web viewprobability of the right tail of the exponential...
TRANSCRIPT
26133: BUSINESS STATISTICSEXAM NOTES
1 TYPES OF DATA.............................................................................................................................................. 21.1 Data Quality (Nominal, Ordinal, Interval, Ratio).....................................................................................................................................21.2 Method of data collection.......................................................................................................................................................................21.3 Types of graphs........................................................................................................................................................................................2
2 DESCRIPTIVE STATISTICS, NUMERICAL MEASURES...................................................................................................32.1 Numerical Data Summaries.....................................................................................................................................................................32.2 Finding Outliers........................................................................................................................................................................................4
3 PROBABILITY [1]............................................................................................................................................. 53.1 Miscellaneous Laws.................................................................................................................................................................................5
4 DEPENDANCE (CHI2 TEST)................................................................................................................................. 6
5 PROBABILITY [2]: DISCREET PROBABILITY DISTRIBUTIONS.........................................................................................75.1 Binomial Distribution...............................................................................................................................................................................75.2 Poisson Distributions...............................................................................................................................................................................7
6 PROBABILITY [3]: CONTINUOUS DISTRIBUTIONS.....................................................................................................86.1 Uniform Distribution................................................................................................................................................................................86.2 Normal Distribution.................................................................................................................................................................................86.3 Exponential Distributions.........................................................................................................................................................................8
7 SAMPLING AND SAMPLING DISTRIBUTIONS............................................................................................................ 97.1 Can the sample be assumed to be normal?.............................................................................................................................................97.2 Standard error of a sample mean............................................................................................................................................................97.3 Finite correction factor............................................................................................................................................................................9
8 INTERVAL ESTIMATION.................................................................................................................................... 108.1 Estimating the population mean with a large N, using “z”...................................................................................................................108.2 Estimating the population mean, using “t-statistic” (σ unknown)......................................................................................................118.3 Estimating the population proportion...................................................................................................................................................128.4 Estimating population variance.............................................................................................................................................................128.5 Estimating sample size..........................................................................................................................................................................13
9 HYPOTHESIS TESTING [1 POPULATION]............................................................................................................... 149.1 Methodology.........................................................................................................................................................................................149.2 Rejection and non-rejection regions......................................................................................................................................................149.3 Types of questions.................................................................................................................................................................................14
10 HYPOTHESIS TESTING [2+ POPULATIONS]............................................................................................................ 17
11 REGRESSION [1]............................................................................................................................................ 19
12 REGRESSION [2]............................................................................................................................................ 20
1
1 TYPES OF DATA
1.1 DATA QUALITY (NOMINAL, ORDINAL , INTERVAL , RATIO)
Nominal (purely descriptive) Ordinal (ordered) Interval (each group of equal magnitude) Ratio (has a zero point)
1.2 METHOD OF DATA COLLECTION
Sampling (small group to represent population)o Cheap
Population (everyone)o Thorough
Time-series (over time)o Shows change
Cross-sectional (once/a snapshot)o Cheap/where time is irrelevant
1.3 TYPES OF GRAPHS
Bar charto Sectional comparison/growth
Line graph Ogive
o Cumulative frequency (percentage less than) Pie chart
o Percentages Scatter plot
o Infer trends
2
2 DESCRIPTIVE STATISTICS, NUMERICAL MEASURES
2.1 NUMERICAL DATA SUMMARIES
2.1.1 Mode
Most popular option
2.1.2 Median
Central option
2.1.3 Mean
M ean=μ=x= ∑ observations¿of observations
1. SD Mode [MODE], [MODE], [1].2. Stat clear [SHIFT], [MODE], [1].3. Enter values [OBSERVATION], [SHIFT], [,], [NUMBER OF OBSERVATIONS], [M+] (repeat for each
observation value).4. Calculate [SHIFT], [2] (S-VAR), [1] (X ), [=].
2.1.4 Variance
Variance=σ2=∑i=1
n
(x i−μ )2
n=s
2
=∑i=1
n
( xi−x )2
n−1
1. Stat clear [SHIFT], [MODE], [1].2. Enter values [OBSERVATION], [SHIFT], [,], [NUMBER OF OBSERVATIONS], [M+] (repeat for each
observation value).3. Calculate standard deviation [SHIFT], [2] (S-VAR), [2] (xσn) OR [3] (xσn-1), [=].4. Square for variance [2], [=].
2.1.5 Standard Deviation
Standard Deviation=σ=√σ 2=s=√s2
1. Stat clear [SHIFT], [MODE], [1].2. Enter values [OBSERVATION], [SHIFT], [,], [NUMBER OF OBSERVATIONS], [M+] (repeat for each
observation value).3. Calculate standard deviation [SHIFT], [2] (S-VAR), [2] (xσn) OR [3] (xσn-1), [=].
3
2.1.6 Coefficient of Variation
Measure of data spread; best method where the data set is positive.
Coefficient of Variation= sx
(100 )=σμ
(100)
2.2 F INDING OUTLIERS
2.2.1 Z-Score
Z-score describes the distance of a number from the average in terms of standard deviations.
Z score=zi=x i−xs
In outliers, z i>3
2.2.2 Box and whisker plot
Use for irregular/asymmetrical data
Describes the data set in terms of 5 points: min ,q1 ,median ,q3 ,max→IQR=q3−q1.
min ¿q1−1.5 (IQR ) q1=split again median=central data point q3=split again max ¿q3+1.5 (IQR )
4
3 PROBABILITY [1]
3.1 M ISCELLANEOUS LAWS
Sum of probabilities = 1 = 100% p'=1−p
3.1.1 Intersection
Both occur: P(A∩B)
3.1.2 Union
Either A or B or both occurring:P (A∪B )
P (A∪B )=P ( A )+P (B )−P ( A∩B )
3.1.3 Conditional Probability
Probability of A occurring given that B already occurs
P (A|B )= P (A∩B )P (B )
5
P(A∩B) P(A∩B’) P(A)P(A’∩B) P(A’∩B’) P(A’)P(B) P(B’) 1
4 DEPENDANCE (CHI2 TEST)
4.1.1 Observed Data
Insert observed data into a probability table
4.1.2 Probability from observations
Probability¿ observation=Observation∑∑
4.1.3 Predicted results if events are independent
Predicted results if events are independant=Column∗Row∑∑
Events as independent W 'W Retail 546 154 700 sum rpSale 1014 286 1300 sum 'rp 1560 440 2000 sum w sum 'w TT
4.1.4 Chi2 Test
1. Create table: for each cell, Chi= χ= (Observed results− predicted results if independant )2
predicted results if independant2. Total all cells: TTotal = Chi2 value
6
Observed data W 'W Retail 420 280 700 sum rpSale 1140 160 1300 sum 'rp 1560 440 2000 sum w sum 'w TT
Probability P (W) P' (W) P (Retail) 0.21 0.14 0.35 P (RP)P' (Sale) 0.57 0.08 0.65 P' (RP) 0.78 0.22 1 P (W) P' (W) TT
Compare Chi2 value with Chi2 critical value [found by entering degrees of freedom (number of rows−1¿(number of columns−1) and alpha value (1−certainty required ¿ into the chi2
tables] if ch i2>ch i2critical value, then the values are dependant
7
5 PROBABILITY [2]: DISCREET PROBABILITY DISTRIBUTIONS
Finite number of observations
1. Determine the type of distributiona. Binomial Distributionb. Poisson Distribution
2. What is the question?a. Probability of x? Probability of more or less than?b. DRAW
3. Get the formula4. Apply the terms
5.1 B INOMIAL D ISTRIBUTION
P ( x )=(nx )Pxqn−x= n!x ! (n−x ) !
pxqn−x
X = number of successes requiredN = number of trialsP = probability of successQ = 1-probability of failure
f ( x=a )=nCa∗pa∗qn−a
F(x) = probability of x successes in n trials
5.2 POISSON D ISTRIBUTIONS
P ( x )= λx e− λ
x !
Λ = mean of Poisson distribution
8
6 PROBABILITY [3]: CONTINUOUS DISTRIBUTIONS
Working strictly with probabilities (percentages etc)
6.1 UNIFORM D ISTRIBUTION
This one looks like a rectangle; you merely need to find the area.
6.2 NORMAL D ISTRIBUTION
6.2.1 Probability density function of the normal distribution
f ( x )= 1σ √2π
e−( 12 )[ x−μ
σ ]2
6.2.2 Standardization (z-scores)
z= x−μσ
Then plug the z score into the z distribution table (single sided test)
6.3 EXPONENTIAL D ISTRIBUTIONS
6.3.1 Probability density function of the exponential distribution
f ( x )= λ e−λx
X & λmust be greater than zero
6.3.2 Probability of the right tail of the exponential distribution
P (x≥ x0 )=e− λ x0
X0 must be greater than 0
9
7 SAMPLING AND SAMPLING DISTRIBUTIONS
7.1 CAN THE SAMPLE BE ASSUMED TO BE NORMAL?
If: sample >30, yes
If: population is normal, yes
7.2 STANDARD ERROR OF A SAMPLE MEAN
For infinite population
σ x=σ√n
For finite population
σ x=( σ√n )(√ N−n
N−1 )N = observations in populationn = observations in sample
7.3 F INITE CORRECTION FACTOR
This is necessary when nN
>0.05
For proportions
σ p̂=√ pqn √ N−n
N−1
p̂=proportion= xn
X = number of items in sample with the requisite characteristic
For quantitative data
σ x=( σ√n )(√ N−n
N−1 )
10
8 INTERVAL ESTIMATION
8.1 ESTIMATING THE POPULATION MEAN WITH A LARGE N, USING “Z”
8.1.1 Basic form
point estimate±criticalvalue∗standarderr∨¿
If z= x−μ
σ√n
and sample mean can be greater or less than the population mean, the confidence interval is:
μ= x± z σ√n
8.1.2 Estimating μ
μ= x± zα /2σ√n
zα /2=z-score of the one sided area outside of the confidence interval
Or
x−zα /2σ√n
≤ μ≤x+zα /2σ√n
Usually, zα /2 for confidence of 95%, see below
8.1.3 Finding zα /2
11
1. Draw2. Plug zα /2into z-tables
8.1.4 Add a finite correction factor
x−zα /2σ√n √ N−n
N−1≤μ≤ x+zα /2
σ√n √ N−n
N−1
8.1.5 If n is small (<30), then you can only use the above formulae if the population is normal
8.2 ESTIMATING THE POPULATION MEAN , USING “T-STATISTIC” (σ UNKNOWN)
8.2.1 T distribution
A distribution that describes the standardized sample mean when σ is unknown and population is normal
8.2.2 T value
Tool used to reach conclusions about null hypothesis
t= x−μs /√n
8.2.3 T distribution table
To read the table we need degrees of freedom and a t value
Degrees of freedom=n−1
t=α /2
8.2.4 Confidence intervals to estimate the population mean using the t-stat
x−t α2 ,n−1
s√n
≤ μ≤x+t α2 ,n−1
s√n
12
8.3 ESTIMATING THE POPULATION PROPORTION
z= p̂− p
√ p̂ q̂n
p̂= sample proportionq̂= 1- p̂p= population proportionn= sample size
8.3.1 Confidence interval to estimated p
p̂−zα /2√ p̂ q̂n
≤ p≤ p̂+zα /2√ p̂ q̂n
8.4 ESTIMATING POPULATION VARIANCE
s2=∑ ( x−x )2
n−1
8.4.1 Chi2 formula for variance
NB: Distribution must be normal to use this formula
χ2=(n−1 ) s2
σ2
df =(n−1 )
8.4.2 Confidence interval to estimate the population variance
(n−1 ) s2
χ a/22 ≤σ2≤
(n−1 ) s2
χ1−a /22
df =(n−1 )
Work χ2out using χ( a2 ), df2
and χ(1−a2 ),df
2and the χ2 tables.
13
8.5 ESTIMATING SAMPLE SIZE
This is used to find the minimum sample size to fulfill the requirements of a particular confidence level within a certain amount of error.
8.5.1 Sample size when estimating µ
n=za /22 σ2
E2=( z a2 σE )
2
E=( x−μ )=Error of Estimation
You either need to work out E, or it can be given as “to be within .03 of the true population proportion”
Always round up, since you can’t have half-people
8.5.2 Sample size when estimating p
n= z2 pqE2
Work out z-stat through confidence interval and tables
14
9 HYPOTHESIS TESTING [1 POPULATION]
9.1 METHODOLOGY
1. Specify the thing of interest2. Formulate H0 and Ha
a. Draw3. Define the level of significance
a. 1 sided or two sided test?i. 1 sided for greater or less
ii. 2 sided for equals4. Test
a. Determine the appropriate statistical testb. Establish the decision rulec. Gather sample datad. Analyze the data
5. Conclude/business application
9.2 REJECTION AND NON-REJECTION REGIONS
Via critical values (inside is non-rejection, outside is rejection region)
15
9.3 USING Z-STAT
9.3.1 Testing hypothesis about a population mean using the z-stat
Z test for a single mean
z= x−μσ /√n
Where result is z, minus z from 0.5 or 1 and find on z table then look up row/column (i.e. the reverse of finding z score)
9.3.1.1 EXAMPLE QUESTION
CPA’s average net Y for sole proprietor is $74914 [statistic from 10 years ago]
Test again, n=112, σ=$14530
STEP 1: HYPOTHESISE
H0: µ=$74914
Ha: µ≠$74914
STEP 2: WHICH TEST TO USE?
Sample size is large (n>30), sample mean as stat, therefore z-stat.
z= x−μσ /√n
STEP 3: WHAT ARE THE CRITICAL VALUES?
Accuracy required: 95%, therefore α=.05
This test involves an = sign, not a ≤ or ≥ sign, so it is a two tailed test
α/2=.05/2=.025
Each side therefore has a .475 success area and a .025 fail area.
Plug .025 into z table to find zα/2 +/- 1.96
STEP 4: FIND TEST STATISTIC
Sample mean = $78695, n = 112, µ = $74914,σ=$14530
z=78695−74914σ 14530 /√112
=2.75
16
STEP 5: COMPARE TO CRITICAL VALUES
Accepted range = +/- 1.96; 2.75 is not in this range, reject null hypothesis
9.3.2 Testing the mean with a finite population
z= x−μσ√n √ N−n
N−1
9.4 USING F-STAT
9.4.1 T-test for µ
P320
t= x−μs√n
df =n−1
9.5 HYPOTHESIS ABOUT A PROPORTION
z= p̂−p
√ pqp
9.6 HYPOTHESIS ABOUT A VARIANCE
P331
χ2=(n−1 ) s2
σ2
df =n−1
9.7 TYPE 2 ERRORS
When null hypothesis is false
See p 334
17
18
10 HYPOTHESIS TESTING [2+ POPULATIONS]
p399
10.1 Z FORMULA FOR THE DIFFERENCE IN TWO SAMPLE MEANS AND POPULATION VARIANCES
z=(x1−x2 )−(μ1−μ2 )
√( σ12
n1+σ22
n2 )μ1−μ2=0
10.1.1 Confidence intervals in estimate of μ1−μ2
(SEE P360)
10.2 T STAT FOR THE DIFFERENCE IN TWO SAMPLE MEANS (VARIANCES UNKNOWN)
(see p365)
10.2.1 Confidence intervals in estimate of μ1−μ2
(see p369)
10.3 STATISTICAL INFERENCES FOR RELATED POPULATIONS
(see p 373)
10.4 STATISTICAL INFERENCES FOR TWO POPULATION PROPORTIONS
(p383)
10.5 STATISTICAL INFERENCES FOR TWO POPULATION VARIANCES
(p390)
Ratio of two sample variances gives F value
19
11 ANOVA
20
12 REGRESSION [1]
12.1 S INGLE REGRESSION
y= (intercept )+c1 x1+c2 x2+…+cn xn
If regression output “p-value” is smaller than .05 reject null hypothesis and use in formula
R^2 shows “goodness” of model (0=bad, 1=good)
12.2 MULTIPLE REGRESSION
In multiple regression R^2 is inaccurate, so we have to adjust
12.3 PROBLEMS
Multi collinearity (values overlap)
21
13 REGRESSION [2] MORE PROBLEMS
Residual is the difference between predicted and actual results
13.1 F-TEST
H0, all of the coefficients = 0
If f-stat > critical F
If significance f < alpha, reject
Testing each coefficient, change one at a time to 0, see if there is a change
22