probability and statistics math 009 (tip reviewer)
DESCRIPTION
Probability and Statistics MATH 009 (TIP Reviewer) James LindoTRANSCRIPT
Instructor: Mr. Ronrick A. Da-AnoReference Book: Elementary Statistics by Ronald Walpole
Statistics: concerned with statistics method of collecting, organizing, presenting, analysis and interpretation of data.
Descriptive Statistics: is the discipline of quantitatively describing the main features of a collection of information, or the quantitative description itself.
1.
*concerned about organizing, summarizing, presenting and interpretation of data.* describing lang (mean, median, mode)Inferential Statistics: deals with making generalizations about the population where only part of it is examined2.* from the word infer which means conclude
Categories of Statistics:
Primary: one data which have been acquired directly from the source.1.Secondary: studies made by others for another purpose2.
Types of Data:
Variable: is a particular attribute of interest that is measurable or observable
Quantitative: any attribute that can be measured by numbers (e.g. height, grades, weight, age)a.Qualitative: have labels / names rather than numbersb.
Types of Variable:
Population and Sample:Population - sum total of all units of analysis (e.g. all TIP students)Sample: a subject or portion of the total population
Distribution: is a pattern of variation of a variable
Nominal (categorical): names / labels (gender, course)1.Ordinal: order / ranking2.Interval - 75, 80, 83, 90, 100 or IQ: 100, 103, 120, 1213.Ratio: obtained from interval, 1.00 = 99-100% / 1.25 = 96-984.
Scale of Measurement
Notation:
Properties:
Probability and Statistics (Lecture 1)
MATH 009 Page 1
*constants can be multiplied after doing the summation
*adding two variables can be done by getting the summation individually and add their sum together.*Do exponents first before multiplying the coefficient. Extract coefficient out of the notation first.*Always check the upper and lower limits.
MATH 009 Page 2
MATH 009 Page 3
Objective: may use a measuring device like a meter stick or weighing scale, which aims to accumulate data.1.Subjective: relying on people's subjective responses, which may all be different like a survey. 2.Use of existing records - library, publication house3.
Methods of Collecting Data
Textual form: report / paragraph1.Tabular form: data in rows and columns2.
Histogram (bar graph)a.Line graphb.Pie graphc.Stem and Leaf Plotd.
Graphical form: 3.
Methods of Presenting Data
A
Frequency Distribution Table (FDT)
Arrange the numbers by value. Follow the columns x rows of the given.1.Determine the range (R) = highest value- lowest value.2.
Rule of thumb: 2k ≥ N (number of population)a.Choose the value of k which makes the value of 2k just above N, but nearest to N. One step higher.b.Determine the Class Size Interval. It must be a whole number. Then, to determine the classes, add it to the lowest value.
c.
Identify the number of classes (K).3.
Steps:
Tally the data based on the # of frequency (F).4.Compute the Class Mark (X). It is just the average of the limits.5.
Compute for the relative frequency (RF).6.
Lower TCB = LL - 0.5a.Upper TCB = UL + 0.5b.
Determine the True Class Boundaries (TCB).7.
Get the Cumulative Frequency (CF), which are <CF (pataas) and >CF (pababa).8.Get the Cumulative relative frequency (RCF)9.
Probability and Statistics (Lecture 2)
MATH 009 Page 4
Example 1:Create a Frequency Distribution Table using the following given:
6 7 20 21 25
10 8 18 30 23
11 13 21 28 24
12 15 9 27 30
8 16 11 11 29
13 19 7 14 22
Arrange the given into ordered numbers.1.
6 10 13 20 25
7 11 14 21 27
7 11 15 21 28
8 11 16 22 29
8 12 18 23 30
9 13 19 24 30
Step 2: Determine the Range (R) = Highest - Lowest. R = 30 - 6 = 24Step 3: Identify the number of classes (K) 2k ≥ N. 2k ≥ 30, so k = 5.Step 4: Determine the Class Size Interval (C) = R / K = 24 / 5 = 4.8 or 5. Step 5: Create the table
CF RCF
Class F X RF TCB <CF >CF <RCF >RCF
6 - 10 6 8 20.00% 5.5 - 10.5 6 30 20.00% 100.00%
11 - 15 8 13 26.67% 10.5 - 15.5 14 24 46.67% 80.00%
16 - 20 5 18 16.67% 15.5 - 20.5 19 16 63.33% 53.33%
21 - 25 6 23 20.00% 20.5 - 25.5 25 11 83.33% 36.67%
26 - 30 5 28 16.67% 25.5 - 30.5 30 5 100.00% 16.67%
N = 30
MATH 009 Page 5
Sampling: is concerned with selection of a subset of individuals from a statistical population
Quantitative: any attribute that can be measured by numbers (e.g. height, grades, weight, age)1.
Continuous (R) - Variables that include all kinds of numbers (integers, fraction, floating numbers, etc)
a.
Discrete (Z) - Variables where only integers are allowed (85,86,87, etc.)b.
Qualitative: have labels / names rather than numbers2.
Types of Variable
No bias.a.Choosing selected TIP students from a fishbowl.b.
Random Sampling - all subsets of the population are given an equal probability.1.
From a club, we will choose 50% females and 50% males.a.
Stratified Sampling - sample of the population is chosen through stratification, which is the process of dividing members of the population into homogeneous subgroups of sampling.
2.
We will choose a few from Quezon City and a few from Marikina.a.Cluster Sampling - is commonly clustered by geography or by time frame3.
Students will be arranged first according to GPA from lowest to highest and then we'll choose.
a.
Systematic Sampling - relies on arranging the study population according to some ordering scheme.
4.
Club supposedly will schoose engineering students, but since Marine peeps are more accessible, we will choose them instead.
a.
Convenience Sampling - is a type of non-probability sampling that involves the sample being drawn from that of the population that is close at hand
5.
Sampling Methods:
Measures of Central Tendency
Measures Ungrouped Data(Raw Data)
Grouped Data
Mean1.(Arithmetic Mean)
Mode 2.(Most Frequent)
*Observationof the most frequent
Median3.
Example:Recall
Probability and Statistics (Lecture 3)
MATH 009 Page 6
6 11 14 20 27
7 11 15 21 27
8 11 16 21 28
8 12 17 22 29
9 13 18 23 30
10 13 19 24 30
Mean (μ) = 530 / 30 = 17.331.
Note that modes can be more than 1. 2 modes (bimodal), 3 modes (tri modal) and 4 modes (quadmodal)
a.Mode = 11 since it appeared 3 times.2.
Median:3.
Since N = 30 is even,
= 16.5*But if N = 31, then
16th value = 17
Recall:
CF RCF
Class F X RF TCB <CF >CF <RCF >RCF
6 - 10 6 8 20.00% 5.5 - 10.5 6 30 20.00% 100.00%
11 - 15 8 13 26.67% 10.5 - 15.5 14 24 46.67% 80.00%
16 - 20 5 18 16.67% 15.5 - 20.5 19 16 63.33% 53.33%
21 - 25 6 23 20.00% 20.5 - 25.5 25 11 83.33% 36.67%
26 - 30 5 28 16.67% 25.5 - 30.5 30 5 100.00% 16.67%
N = 30
Mean:1.
Median:2.= 17.33
*To get the median class, N / 2 = 30 / 2 = 15. Get the class that has the 15th frequency!
MATH 009 Page 7
Mode:3. = 16.5
*To get the modal class, look for the class with the highest frequency.
= 12.5
MATH 009 Page 8
Measures of Location:
Where:N = number of samples or the total populationj = percentile, quartile or decile
Percentile: P1 (1%), P2 (2%), P3 (3%), …, P100 (100%)Quartile: Q1 (25%), Q2, (50%), Q3 (75%) and Q4 (100%)Decile: D1 (10%), D2 (20%), D3 (30%), … , D10 (100%)
Conversion:Q25 = P25D2 = P20
Example:
Measure of Dispersion (How widely dispersed yung data.)
E Ungrouped Data Grouped Data
1.Variation
2. Standard Deviation
Example:
Example 1: Ungrouped Dataμ = 17.33
6 11 14 20 27
7 11 15 21 27
8 11 16 21 28
8 12 17 22 29
Probability and Statistics (Lecture 4)
MATH 009 Page 9
8 12 17 22 29
9 13 18 23 30
10 13 19 24 30
Variance:
= 54.47
Standard Deviation:
= 7.38
Example 2 (Grouped Data):μ = 17.33
CF RCF
Class F X RF TCB <CF >CF <RCF >RCF
6 - 10 6 8 20.00% 5.5 - 10.5 6 30 20.00% 100.00%
11 - 15 8 13 26.67% 10.5 - 15.5 14 24 46.67% 80.00%
16 - 20 5 18 16.67% 15.5 - 20.5 19 16 63.33% 53.33%
21 - 25 6 23 20.00% 20.5 - 25.5 25 11 83.33% 36.67%
26 - 30 5 28 16.67% 25.5 - 30.5 30 5 100.00% 16.67%
N = 30
Variance:
= 48Standard Deviation
= 6.93
MATH 009 Page 10
Interquartile Range (IQR) = 75th percentile - 25th percentile1.Semi-Interquartile Range (SIQR) = IQR / 22.
Measure of Variation
Example:
6 11 14 20 27
7 11 15 21 27
8 11 16 21 28
8 12 17 22 29
9 13 18 23 30
10 13 19 24 30
IQR = 23.5 - 11 = 12.5SIQR = 12.5 / 2 = 6.25
Measure of Skewness *symmetry of the central tendencies*horizontal or x-axis ang measured*If positive skewness, Mode < Median < Mean*If negative skewness, Mode > Median > Mean*If skewness is equal to 1, Mode = Median = Mean
Measure of Kurtosis*Measure of peakedness, or kung gaano kataas yung graph.*Vertical y-axis is measured.
Probability and Statistics (Lecture 5)
MATH 009 Page 11
Ungrouped Data Grouped Data
Skewness
Kurtosis
Example 1 (Ungrouped Data):μ = 17.33Median = 16.5σ = 7.38
6 11 14 20 27
7 11 15 21 27
8 11 16 21 28
8 12 17 22 29
9 13 18 23 30
10 13 19 24 30
Skewness:
= 0.34*positively skewed
Kurtosis:
= 1.82 - 3= -1.18
Recall:
MATH 009 Page 12
Recall:μ = 17.33Median = 16.5σ = 6.93
CF RCF
Class F X RF TCB <CF >CF <RCF >RCF
6 - 10 6 8 20.00% 5.5 - 10.5 6 30 20.00% 100.00%
11 - 15 8 13 26.67% 10.5 - 15.5 14 24 46.67% 80.00%
16 - 20 5 18 16.67% 15.5 - 20.5 19 16 63.33% 53.33%
21 - 25 6 23 20.00% 20.5 - 25.5 25 11 83.33% 36.67%
26 - 30 5 28 16.67% 25.5 - 30.5 30 5 100.00% 16.67%
N = 30
Skewness
= 0.36*Positively skewed
Kurtosis:
= 1.72 - 3= -1.28
MATH 009 Page 13
Probability Distribution:
Continuous Probability Distribution is a random variable that can assume an uncountable infinite numbers of possible values. Say we have a function f(x) from which probability estimates about x are made, then the function is called the probability density function of x: pdf(x).
Normal Probability Distribution1.
where -∞<x<∞2. Standard Normal Distribution is a normal distribution with mean 0 and standard deviation of * z~N(0,1)*normal distribution can be standardized by
Example:*If a person scored a 70 in a test with mean of 50 and standard deviation of 10, converting it to z will be?
Areas of the Normal Curve:1. P (0 < Z < Z1) = A(Z1)2. P (-Z1 < Z <0) = A(-Z1)3. P (Z1 < Z < Z2) = A (Z2) - A(Z1)4. P (-Z1 < Z < -Z2) = A (-Z1) - A(-Z2)5. P (-Z1 < Z < Z2) = A (-Z1) + A(Z2)6. P (Z1 below) = 0.5 + A(Z1) , P (Z1 above) = 0.5 - A(Z1)7. P (-Z1 above) = 0.5 - A(Z1), P (-Z1 above) = 0.5 + A (Z1)
M - Probability Distribution
MATH 009 Page 14
M - Normal Table
MATH 009 Page 15
M - Permutation and Combination
MATH 009 Page 16
Probability - is synonymous to chance. The probability of an event occurring is a measure of how likely an event will occur.
Experiment - is a process designed to discover, test or illustrate a truth, principle, or effect.
*Well-defined outcomes = no doubts about the results.
Random experiment - a process for gathering data. It can be repeated under basically the same conditions leading to well-defined outcomes.
Examples of Random Experiments:1. Tossing a coin2. Throwing a pair of dice.3. Observing the number of students who secure dropping forms per semester.4. Recording the time it takes to enroll under BSE Program.5. Number of commercial breaks in a TV program per show.
*It is the total possible outcomes.*In the Venn diagram, S is the universe.
Sample Space - is the set of all possible outcomes of a random experiment usually denoted by the letter S.
Example:1. In the experiment of tossing a coin, the sample space is S = {H,T}.2. In throwing a dice, S = {1,2,3,4,5,6}.
Event - is a subset of the sample space, denoted by E or any letter in the alphabet except S.
*In the Venn diagram, E is one of the circles.
Examples:1. In tossing three fair coins,S = {HHH,HHT,HTH,HTT,THH,THT,TTT,TTH} = 2n = 23 = 8 possibilities.Event of getting at least two heads.E = event of getting at least two heads = {HHT,HTH,THH,HTH} = 4 possibilities
2. In throwing a pair of diceS = {(1,1),(1,2),(1,3),(1,4),(1,5),(1,6)…} = 6*6 = 36 possibilitiesE = event of getting a sum of 5 = {(1,4),(2,3),(3,2),(4,1)} = 4 possibilities
Operations on Events1. Union of Two Events - Combine2. Intersection of Two Events - Common Components3. Complement of an Event = (S - E) or E' (All S components that are not in E)4. Mutually Exclusive Events - (E1^E2) = null or empty, then they are mutually exclusive.5. Independent Event = Event 1 doesn't affect event 2.
Example: In the random experiment of tossing three coins the sample space, S = {HTT, HHT, HTH,HHH,THH,THT,TTH,TTT}E1 = {HHT,HTH,THH}E2 = {(HTT,HHT,HTH,THH,TTH,TTT}E3 = {HHH,THT,HTT}
M - Probability, Experiments, Random Experiments
MATH 009 Page 17
E3 = {HHH,THT,HTT}
Then, E1 U E2 = {HHT,HTH,THH,HTT,TTH,TTT}E2 ^ E3 = {HTT}E2' = {HHH,THT}E1 ^ E3 = Null / Empty, hence they are mutually exclusive events.
Approaches to Probability
*You have knowledge beforehand. 1. A Priori Approach
Example 1:S = {HHH,HHT,HTH,HTT,THH,THT,TTT,TTH} = 8 possibilitiesE = event of getting at least two heads = {HHT,HTH,THH,HTH} = 4 possibilitiesP(E) = 4/8 = 1/2
Example 2:S = {HTT, HHT, HTH,HHH,THH,THT,TTH,TTT} --> 8E1 U E2 = {HHT,HTH,THH,HTT,TTH,TTT} --> 6P(E1 U E2) = 6/8 = 3/4
Example 3:E2 ^ E3 = {HTT}P (E2 ^ E3) = 1/8
Example4: E2' = {HHH,THT}P (E2') = 2/8 = 1/4
Example 5:E1 ^ E3 = Null / EmptyP(E1 ^ E3) = 0
*Law of large numbers*Ilang beses ginawa?
2. A Posteriori Approach
Out of 100 experiments:
HTT HHT HTH HHH THH THT TTH TTT
10 13 12 17 18 12 8 10
P(E) of two heads = {HHT,HTH,THH,HTH} = (13+12+18+12) / 100= 11/20
3. A Subjective Approach (the approach is based on someone, not relative)*Pacquiao VS Mayweather. (Panalo si Pacquiao sabi ni X)
Operations on Probability:1. Addition Rule: (U) (Union) (Or)
MATH 009 Page 18
1. Addition Rule: (U) (Union) (Or)*P[E1 U E2] = P[E1] + P[E2] - P[E1 ^ E2]*If they are mutually exclusive events: P[E1 U E2] = P[E1] + P[E2]
2. Multiplication Rule: (^) (Intersection) (And) (But)*P[E1 ^ E2] = P[E1/E2] X P[E2]*If E1 and E2 are independent events, P(E1 / E2) = P (E1)* P[E1 ^ E2] = P[E1] X P[E2]
3. The probability of the complement of E is*P(E') = 1 - P(E)
Example: Throwing two dice, what is the probability that the sum is not equal to 6.Answer: 31/36
4. Conditional Probability*Probability that A will happen after B has occurred.
-
MATH 009 Page 19
M - More Examples
MATH 009 Page 20
MATH 009 Page 21
MATH 009 Page 22
MATH 009 Page 23
Random Variable - a rule or function defined over a sample space and is denoted by any capital letter in the English alphabet. It assigns a real number to every event of the same space.
Two Types1. Discrete Random Variable - a random variable that can assume a finite or a countable number of values. (e.g. number of heads in tossing of 2 coins, number of car owners)2. Continuous Random Variable - random variable that can assume an interval or continuum of values. (e.g height of an TIP student, weight of newly born baby)
M - Random Variables and Probability Distributions
MATH 009 Page 24
Chapter 15Wednesday, February 11, 2015 4:34 PM
MATH 009 Page 25
Chi-Squared Test* to test how likely it is than an observed distribution is due to chance. * Goodness of fit statistic or test of independence*significant relationship between two variables
When to use:1. Random sampling method is used.2. Each population is at least 10 times as large as its respective sample.3. Variables understudy are categorical.4. The expected frequency count for each cell of the table is at least 5.
Steps:
*H0 (Variable A and B are independent) and HA (Variable A and B are dependent)1. State null (H0) and alternative hypotheses (HA).
*Degrees of Freedom DF = (r-1) * (c-1)*Expected Frequency Er,c = (nr * nc ) / n (nr and nc are the row and column totals and n is the overall total)*Observed Frequency Or,c = based on the table*Test Statistic X2 = ∑ (O - E)2 / E
2. Degrees of Freedom, Expected and Observed Frequencies, Chi-Squared
3. Level of Significance ( α = 0.05 )4. Decision Rule: If significance level α > P - value, reject H0. Otherwise, fail to reject H0. Conclusion.
F - Chi Square Test
MATH 009 Page 26
ANOVA - statistical comparison of at least two populationOne Way of Analysis of Variance - technique used to compare the means of three or more samples
Formula:n = total population sizep = number of groups
(Square of the sum divided by population)
Total SS =
Between SS =
Within SS = Total SS - Between SS
Between MS =
Within MS =
x =
F - Analysis of Variance (ANOVA)
MATH 009 Page 27
*is a simultaneous test-taking at the samples all at a single time*is a technique designed to test whether or not more than two samples (a group) are significantly related to each other*t-test together with z-test is used to test non-significance of difference between a single pair of samples.
Problem: The data below represents the number of hours of pain relief provided by 3 different brands of headache tables administered to subjects. It shows the mean number of hours of relief provided by the tablets.
XA XB XC XD
5 9 3 2
4 7 5 3
8 8 2 4
Step 1: Formulate the null hypothesis.*Null Hypothesis Ho always shows that THERE IS NO SIGNIFICANT DIFFERENCE between the samples.*Alternate Hypothesis Ha always shows that THERE IS A SIGNIFICANT DIFFERENCE between the samples.
HO: There is no significant difference in the number of hours of relief provided by the 3 different brands of headache tablets.
Step 2: Set the level of significance*The level of significance is at default 0.05 unless otherwise stated in the problem!
α = 0.05
Step 3: Choose the appropriate test statistic*F-test is normally employed since where comparing variances.
Test Statistic: F-test (ANOVA)
Compute for the TSS (Total Sum of the squares)a.Step 4: Compute the ANOVA
*(I-square mo lahat ng sum tapos iadd sila together ) minus (add mo sa lahat si sum tapos isquare mo divided by total population)
XA XB XC XD
5 9 3 2
4 7 5 3
8 8 2 4
XA2 XB
2 XC2 XD
2
25 81 9 4
16 49 25 9
64 64 4 16
Analysis of Variance (ANOVA)
MATH 009 Page 28
Determine the between-column sum of squares (SSB) defined by the formula:b.
*(Summation of squared totals divided by number of rows) - (Sum of the totals squared over total population)
XA XB XC XD
5 9 3 2
4 7 5 3
8 8 2 4
Compute the within column-variance or within-column sum of squares defined by,c.
Construct an analysis of variance table (ANOVA table) as shown below:d.
ANOVA table on the three samples subjected to different tables
Source of Variation Sum of Squares Df MSS = SS / Df
Between-column 48.67 3 16.22
Within-column 17.33 8 2.17
Total 66.00 11
*between column df = columns (k) - 1 = 4 - 1 = 3*total column df =( rows * column ) - 1 = 12 - 1 = 11*within column df = total column - within column = 8*total = SSB + SSW = 48.67 + 17.33 = 66.00*between column MSS = SSB / DF = 48.67 / 3 = 16.22*within-column MSS = SSW / DF = 17.33 / 8 = 2.17
Compute the F-test (Fisher) formula:e.
*
Locate the tabular value of F.f.*To do this, use
1. 2.
Decision Rules:a.
FTV = 4.07
MATH 009 Page 29
2.
MATH 009 Page 30
Chi-Squared Test* to test how likely it is than an observed distribution is due to chance. * Goodness of fit statistic or test of independence*significant relationship between two variables
When to use:1. Random sampling method is used.2. Each population is at least 10 times as large as its respective sample.3. Variables understudy are categorical.4. The expected frequency count for each cell of the table is at least 5.
Test the hypothesis that educational attainment does not depend on socio-economic status for the following 100 persons in a particular community.
Finished College Did not finished College Total
Poor 18 10 28
Middle Class 28 24 52
Rich 14 6 20
Total 60 40 100
Step 1: State null (H0).*H0 (Variable A and B are independent)
Null Hypothesis (HO) = Socio-Economic Status is independent from Educational Attainment
Degrees of Freedom DF = (r-1) * (c-1)a.*Rows (not including total) = 3*Column (not including total) = 2
Step2. Degrees of Freedom, Expected and Observed Frequencies, Chi-Squared
DF = (r-1)*(c-1) = (3-1)*(2-1) = 2
FE (Expected Frequency) =
b.
FO (Observed Frequency) = based on the tablec.
Test Statistic X2 = ∑ (O - E)2 / Ed.
Finished College Did not finished College Total
Poor FO = 18 / FE =
FO = 10 / FE =
28
Middle Class FO =28 / FE =
FO = 24 / FE =
52
Rich FO =14 / FE =
FO = 6 / FE =
20
Total 60 40 100
Chi-Square Test
MATH 009 Page 31
Step 3. Level of Significance α = 0.05
Step 4. Get the tabulated value.*To get it, use the coordinates, (level of significance, df)
MATH 009 Page 32
Comparing two sample means:
=
Problem: In a study of abstract reasoning, a sample group of male and female students scored as shown below:
Gender Sample Size Mean Standard Deviation
Male 95 29.25 10.83
Female 85 30.72 8.72
Step 1: Get the Null Hypothesis (HO)*The two samples are normally independent.
HO = There is no significant difference between sample 1 and sample 2.
Step 2: Get the level of significanceα = 0.10
Step 3: Use appropriate statisticUse Z - test since population is greater than 30.
Test Statistic: Z-test
Step 4: Get tabulated value*To get tabulated value, use this coordinate (significance level, two tailed test)
TV = 1.645
Step 5: Compute for the Z-Value
Step 6: Compare Calculated Value and Tabulated ValueCV = -1.00TV = 1.645
*Decision Rule, if CV > TV, reject Ho. If CV < TV, accept Ho.
Z - Test for two population samples
MATH 009 Page 33
*Decision Rule, if CV > TV, reject Ho. If CV < TV, accept Ho.
Since -1.00 < 1.645, accept Ho.
MATH 009 Page 34