probability and statistics math 009 (tip reviewer)

Instructor: Mr. Ronrick A. Da-AnoReference Book: Elementary Statistics by Ronald Walpole

Statistics: concerned with statistics method of collecting, organizing, presenting, analysis and interpretation of data.

Descriptive Statistics: is the discipline of quantitatively describing the main features of a collection of information, or the quantitative description itself.

1.

*concerned about organizing, summarizing, presenting and interpretation of data.* describing lang (mean, median, mode)Inferential Statistics: deals with making generalizations about the population where only part of it is examined2.* from the word infer which means conclude

Categories of Statistics:

Primary: one data which have been acquired directly from the source.1.Secondary: studies made by others for another purpose2.

Types of Data:

Variable: is a particular attribute of interest that is measurable or observable

Quantitative: any attribute that can be measured by numbers (e.g. height, grades, weight, age)a.Qualitative: have labels / names rather than numbersb.

Types of Variable:

Population and Sample:Population - sum total of all units of analysis (e.g. all TIP students)Sample: a subject or portion of the total population

Distribution: is a pattern of variation of a variable

Nominal (categorical): names / labels (gender, course)1.Ordinal: order / ranking2.Interval - 75, 80, 83, 90, 100 or IQ: 100, 103, 120, 1213.Ratio: obtained from interval, 1.00 = 99-100% / 1.25 = 96-984.

Scale of Measurement

Notation:

Properties:

Probability and Statistics (Lecture 1)

MATH 009

*constants can be multiplied after doing the summation

*adding two variables can be done by getting the summation individually and add their sum together.*Do exponents first before multiplying the coefficient. Extract coefficient out of the notation first.*Always check the upper and lower limits.

MATH 009

MATH 009

Objective: may use a measuring device like a meter stick or weighing scale, which aims to accumulate data.1.Subjective: relying on people's subjective responses, which may all be different like a survey. 2.Use of existing records - library, publication house3.

Methods of Collecting Data

Textual form: report / paragraph1.Tabular form: data in rows and columns2.

Histogram (bar graph)a.Line graphb.Pie graphc.Stem and Leaf Plotd.

Graphical form: 3.

Methods of Presenting Data

A

Frequency Distribution Table (FDT)

Arrange the numbers by value. Follow the columns x rows of the given.1.Determine the range (R) = highest value- lowest value.2.

Rule of thumb: 2k ≥ N (number of population)a.Choose the value of k which makes the value of 2k just above N, but nearest to N. One step higher.b.Determine the Class Size Interval. It must be a whole number. Then, to determine the classes, add it to the lowest value.

c.

Identify the number of classes (K).3.

Steps:

Tally the data based on the # of frequency (F).4.Compute the Class Mark (X). It is just the average of the limits.5.

Compute for the relative frequency (RF).6.

Lower TCB = LL - 0.5a.Upper TCB = UL + 0.5b.

Determine the True Class Boundaries (TCB).7.

Get the Cumulative Frequency (CF), which are <CF (pataas) and >CF (pababa).8.Get the Cumulative relative frequency (RCF)9.


MATH 009

Example 1:Create a Frequency Distribution Table using the following given:

6 7 20 21 25

10 8 18 30 23

11 13 21 28 24

12 15 9 27 30

8 16 11 11 29

13 19 7 14 22

Arrange the given into ordered numbers.1.

6 10 13 20 25

7 11 14 21 27

7 11 15 21 28

8 11 16 22 29

8 12 18 23 30

9 13 19 24 30

Step 2: Determine the Range (R) = Highest - Lowest. R = 30 - 6 = 24Step 3: Identify the number of classes (K) 2k ≥ N. 2k ≥ 30, so k = 5.Step 4: Determine the Class Size Interval (C) = R / K = 24 / 5 = 4.8 or 5. Step 5: Create the table

CF RCF

Class F X RF TCB <CF >CF <RCF >RCF

6 - 10 6 8 20.00% 5.5 - 10.5 6 30 20.00% 100.00%

11 - 15 8 13 26.67% 10.5 - 15.5 14 24 46.67% 80.00%

16 - 20 5 18 16.67% 15.5 - 20.5 19 16 63.33% 53.33%

21 - 25 6 23 20.00% 20.5 - 25.5 25 11 83.33% 36.67%

26 - 30 5 28 16.67% 25.5 - 30.5 30 5 100.00% 16.67%

N = 30

MATH 009

Sampling: is concerned with selection of a subset of individuals from a statistical population

Quantitative: any attribute that can be measured by numbers (e.g. height, grades, weight, age)1.

Continuous (R) - Variables that include all kinds of numbers (integers, fraction, floating numbers, etc)

a.

Discrete (Z) - Variables where only integers are allowed (85,86,87, etc.)b.

Qualitative: have labels / names rather than numbers2.

Types of Variable

No bias.a.Choosing selected TIP students from a fishbowl.b.

Random Sampling - all subsets of the population are given an equal probability.1.

From a club, we will choose 50% females and 50% males.a.

Stratified Sampling - sample of the population is chosen through stratification, which is the process of dividing members of the population into homogeneous subgroups of sampling.

2.

We will choose a few from Quezon City and a few from Marikina.a.Cluster Sampling - is commonly clustered by geography or by time frame3.

Students will be arranged first according to GPA from lowest to highest and then we'll choose.

a.

Systematic Sampling - relies on arranging the study population according to some ordering scheme.

4.

Club supposedly will schoose engineering students, but since Marine peeps are more accessible, we will choose them instead.

a.

Convenience Sampling - is a type of non-probability sampling that involves the sample being drawn from that of the population that is close at hand

5.

Sampling Methods:

Measures of Central Tendency

Measures Ungrouped Data(Raw Data)

Grouped Data

Mean1.(Arithmetic Mean)

Mode 2.(Most Frequent)

*Observationof the most frequent

Median3.

Example:Recall


MATH 009

6 11 14 20 27

7 11 15 21 27

8 11 16 21 28

8 12 17 22 29

9 13 18 23 30

10 13 19 24 30

Mean (μ) = 530 / 30 = 17.331.

Note that modes can be more than 1. 2 modes (bimodal), 3 modes (tri modal) and 4 modes (quadmodal)

a.Mode = 11 since it appeared 3 times.2.

Median:3.

Since N = 30 is even,

= 16.5*But if N = 31, then

16th value = 17

Recall:

CF RCF


6 - 10 6 8 20.00% 5.5 - 10.5 6 30 20.00% 100.00%

11 - 15 8 13 26.67% 10.5 - 15.5 14 24 46.67% 80.00%

16 - 20 5 18 16.67% 15.5 - 20.5 19 16 63.33% 53.33%

21 - 25 6 23 20.00% 20.5 - 25.5 25 11 83.33% 36.67%

26 - 30 5 28 16.67% 25.5 - 30.5 30 5 100.00% 16.67%

N = 30

Mean:1.

Median:2.= 17.33

*To get the median class, N / 2 = 30 / 2 = 15. Get the class that has the 15th frequency!

MATH 009

Mode:3. = 16.5

*To get the modal class, look for the class with the highest frequency.

= 12.5

MATH 009

Measures of Location:

Where:N = number of samples or the total populationj = percentile, quartile or decile

Percentile: P1 (1%), P2 (2%), P3 (3%), …, P100 (100%)Quartile: Q1 (25%), Q2, (50%), Q3 (75%) and Q4 (100%)Decile: D1 (10%), D2 (20%), D3 (30%), … , D10 (100%)

Conversion:Q25 = P25D2 = P20

Example:

Measure of Dispersion (How widely dispersed yung data.)

E Ungrouped Data Grouped Data

1.Variation

2. Standard Deviation

Example:

Example 1: Ungrouped Dataμ = 17.33

6 11 14 20 27

7 11 15 21 27

8 11 16 21 28

8 12 17 22 29


MATH 009

8 12 17 22 29

9 13 18 23 30

10 13 19 24 30

Variance:

= 54.47

Standard Deviation:

= 7.38

Example 2 (Grouped Data):μ = 17.33

CF RCF


6 - 10 6 8 20.00% 5.5 - 10.5 6 30 20.00% 100.00%

11 - 15 8 13 26.67% 10.5 - 15.5 14 24 46.67% 80.00%

16 - 20 5 18 16.67% 15.5 - 20.5 19 16 63.33% 53.33%

21 - 25 6 23 20.00% 20.5 - 25.5 25 11 83.33% 36.67%

26 - 30 5 28 16.67% 25.5 - 30.5 30 5 100.00% 16.67%

N = 30

Variance:

= 48Standard Deviation

= 6.93

MATH 009

Interquartile Range (IQR) = 75th percentile - 25th percentile1.Semi-Interquartile Range (SIQR) = IQR / 22.

Measure of Variation

Example:

6 11 14 20 27

7 11 15 21 27

8 11 16 21 28

8 12 17 22 29

9 13 18 23 30

10 13 19 24 30

IQR = 23.5 - 11 = 12.5SIQR = 12.5 / 2 = 6.25

Measure of Skewness *symmetry of the central tendencies*horizontal or x-axis ang measured*If positive skewness, Mode < Median < Mean*If negative skewness, Mode > Median > Mean*If skewness is equal to 1, Mode = Median = Mean

Measure of Kurtosis*Measure of peakedness, or kung gaano kataas yung graph.*Vertical y-axis is measured.


MATH 009

Ungrouped Data Grouped Data

Skewness

Kurtosis

Example 1 (Ungrouped Data):μ = 17.33Median = 16.5σ = 7.38

6 11 14 20 27

7 11 15 21 27

8 11 16 21 28

8 12 17 22 29

9 13 18 23 30

10 13 19 24 30

Skewness:

= 0.34*positively skewed

Kurtosis:

= 1.82 - 3= -1.18

Recall:

MATH 009

Recall:μ = 17.33Median = 16.5σ = 6.93

CF RCF


6 - 10 6 8 20.00% 5.5 - 10.5 6 30 20.00% 100.00%

11 - 15 8 13 26.67% 10.5 - 15.5 14 24 46.67% 80.00%

16 - 20 5 18 16.67% 15.5 - 20.5 19 16 63.33% 53.33%

21 - 25 6 23 20.00% 20.5 - 25.5 25 11 83.33% 36.67%

26 - 30 5 28 16.67% 25.5 - 30.5 30 5 100.00% 16.67%

N = 30

Skewness

= 0.36*Positively skewed

Kurtosis:

= 1.72 - 3= -1.28

MATH 009

Probability Distribution:

Continuous Probability Distribution is a random variable that can assume an uncountable infinite numbers of possible values. Say we have a function f(x) from which probability estimates about x are made, then the function is called the probability density function of x: pdf(x).

Normal Probability Distribution1.

where -∞<x<∞2. Standard Normal Distribution is a normal distribution with mean 0 and standard deviation of * z~N(0,1)*normal distribution can be standardized by

Example:*If a person scored a 70 in a test with mean of 50 and standard deviation of 10, converting it to z will be?

Areas of the Normal Curve:1. P (0 < Z < Z1) = A(Z1)2. P (-Z1 < Z <0) = A(-Z1)3. P (Z1 < Z < Z2) = A (Z2) - A(Z1)4. P (-Z1 < Z < -Z2) = A (-Z1) - A(-Z2)5. P (-Z1 < Z < Z2) = A (-Z1) + A(Z2)6. P (Z1 below) = 0.5 + A(Z1) , P (Z1 above) = 0.5 - A(Z1)7. P (-Z1 above) = 0.5 - A(Z1), P (-Z1 above) = 0.5 + A (Z1)

M - Probability Distribution

MATH 009

M - Normal Table

MATH 009

M - Permutation and Combination

MATH 009

Probability - is synonymous to chance. The probability of an event occurring is a measure of how likely an event will occur.

Experiment - is a process designed to discover, test or illustrate a truth, principle, or effect.

*Well-defined outcomes = no doubts about the results.

Random experiment - a process for gathering data. It can be repeated under basically the same conditions leading to well-defined outcomes.

Examples of Random Experiments:1. Tossing a coin2. Throwing a pair of dice.3. Observing the number of students who secure dropping forms per semester.4. Recording the time it takes to enroll under BSE Program.5. Number of commercial breaks in a TV program per show.

*It is the total possible outcomes.*In the Venn diagram, S is the universe.

Sample Space - is the set of all possible outcomes of a random experiment usually denoted by the letter S.

Example:1. In the experiment of tossing a coin, the sample space is S = {H,T}.2. In throwing a dice, S = {1,2,3,4,5,6}.

Event - is a subset of the sample space, denoted by E or any letter in the alphabet except S.

*In the Venn diagram, E is one of the circles.

Examples:1. In tossing three fair coins,S = {HHH,HHT,HTH,HTT,THH,THT,TTT,TTH} = 2n = 23 = 8 possibilities.Event of getting at least two heads.E = event of getting at least two heads = {HHT,HTH,THH,HTH} = 4 possibilities

2. In throwing a pair of diceS = {(1,1),(1,2),(1,3),(1,4),(1,5),(1,6)…} = 6*6 = 36 possibilitiesE = event of getting a sum of 5 = {(1,4),(2,3),(3,2),(4,1)} = 4 possibilities

Operations on Events1. Union of Two Events - Combine2. Intersection of Two Events - Common Components3. Complement of an Event = (S - E) or E' (All S components that are not in E)4. Mutually Exclusive Events - (E1^E2) = null or empty, then they are mutually exclusive.5. Independent Event = Event 1 doesn't affect event 2.

Example: In the random experiment of tossing three coins the sample space, S = {HTT, HHT, HTH,HHH,THH,THT,TTH,TTT}E1 = {HHT,HTH,THH}E2 = {(HTT,HHT,HTH,THH,TTH,TTT}E3 = {HHH,THT,HTT}

M - Probability, Experiments, Random Experiments

MATH 009

E3 = {HHH,THT,HTT}

Then, E1 U E2 = {HHT,HTH,THH,HTT,TTH,TTT}E2 ^ E3 = {HTT}E2' = {HHH,THT}E1 ^ E3 = Null / Empty, hence they are mutually exclusive events.

Approaches to Probability

*You have knowledge beforehand. 1. A Priori Approach

Example 1:S = {HHH,HHT,HTH,HTT,THH,THT,TTT,TTH} = 8 possibilitiesE = event of getting at least two heads = {HHT,HTH,THH,HTH} = 4 possibilitiesP(E) = 4/8 = 1/2

Example 2:S = {HTT, HHT, HTH,HHH,THH,THT,TTH,TTT} --> 8E1 U E2 = {HHT,HTH,THH,HTT,TTH,TTT} --> 6P(E1 U E2) = 6/8 = 3/4

Example 3:E2 ^ E3 = {HTT}P (E2 ^ E3) = 1/8

Example4: E2' = {HHH,THT}P (E2') = 2/8 = 1/4

Example 5:E1 ^ E3 = Null / EmptyP(E1 ^ E3) = 0

*Law of large numbers*Ilang beses ginawa?

2. A Posteriori Approach

Out of 100 experiments:

HTT HHT HTH HHH THH THT TTH TTT

10 13 12 17 18 12 8 10

P(E) of two heads = {HHT,HTH,THH,HTH} = (13+12+18+12) / 100= 11/20

3. A Subjective Approach (the approach is based on someone, not relative)*Pacquiao VS Mayweather. (Panalo si Pacquiao sabi ni X)

Operations on Probability:1. Addition Rule: (U) (Union) (Or)

MATH 009

1. Addition Rule: (U) (Union) (Or)*P[E1 U E2] = P[E1] + P[E2] - P[E1 ^ E2]*If they are mutually exclusive events: P[E1 U E2] = P[E1] + P[E2]

2. Multiplication Rule: (^) (Intersection) (And) (But)*P[E1 ^ E2] = P[E1/E2] X P[E2]*If E1 and E2 are independent events, P(E1 / E2) = P (E1)* P[E1 ^ E2] = P[E1] X P[E2]

3. The probability of the complement of E is*P(E') = 1 - P(E)

Example: Throwing two dice, what is the probability that the sum is not equal to 6.Answer: 31/36

4. Conditional Probability*Probability that A will happen after B has occurred.

-

MATH 009

M - More Examples

MATH 009

MATH 009

Random Variable - a rule or function defined over a sample space and is denoted by any capital letter in the English alphabet. It assigns a real number to every event of the same space.

Two Types1. Discrete Random Variable - a random variable that can assume a finite or a countable number of values. (e.g. number of heads in tossing of 2 coins, number of car owners)2. Continuous Random Variable - random variable that can assume an interval or continuum of values. (e.g height of an TIP student, weight of newly born baby)

M - Random Variables and Probability Distributions

MATH 009

Chapter 15Wednesday, February 11, 2015 4:34 PM

MATH 009

Chi-Squared Test* to test how likely it is than an observed distribution is due to chance. * Goodness of fit statistic or test of independence*significant relationship between two variables

When to use:1. Random sampling method is used.2. Each population is at least 10 times as large as its respective sample.3. Variables understudy are categorical.4. The expected frequency count for each cell of the table is at least 5.

Steps:

*H0 (Variable A and B are independent) and HA (Variable A and B are dependent)1. State null (H0) and alternative hypotheses (HA).

*Degrees of Freedom DF = (r-1) * (c-1)*Expected Frequency Er,c = (nr * nc ) / n (nr and nc are the row and column totals and n is the overall total)*Observed Frequency Or,c = based on the table*Test Statistic X2 = ∑ (O - E)2 / E

2. Degrees of Freedom, Expected and Observed Frequencies, Chi-Squared

3. Level of Significance ( α = 0.05 )4. Decision Rule: If significance level α > P - value, reject H0. Otherwise, fail to reject H0. Conclusion.

F - Chi Square Test

MATH 009

ANOVA - statistical comparison of at least two populationOne Way of Analysis of Variance - technique used to compare the means of three or more samples

Formula:n = total population sizep = number of groups

(Square of the sum divided by population)

Total SS =

Between SS =

Within SS = Total SS - Between SS

Between MS =

Within MS =

x =

F - Analysis of Variance (ANOVA)

MATH 009

*is a simultaneous test-taking at the samples all at a single time*is a technique designed to test whether or not more than two samples (a group) are significantly related to each other*t-test together with z-test is used to test non-significance of difference between a single pair of samples.

Problem: The data below represents the number of hours of pain relief provided by 3 different brands of headache tables administered to subjects. It shows the mean number of hours of relief provided by the tablets.

XA XB XC XD

5 9 3 2

4 7 5 3

8 8 2 4

Step 1: Formulate the null hypothesis.*Null Hypothesis Ho always shows that THERE IS NO SIGNIFICANT DIFFERENCE between the samples.*Alternate Hypothesis Ha always shows that THERE IS A SIGNIFICANT DIFFERENCE between the samples.

HO: There is no significant difference in the number of hours of relief provided by the 3 different brands of headache tablets.

Step 2: Set the level of significance*The level of significance is at default 0.05 unless otherwise stated in the problem!

α = 0.05

Step 3: Choose the appropriate test statistic*F-test is normally employed since where comparing variances.

Test Statistic: F-test (ANOVA)

Compute for the TSS (Total Sum of the squares)a.Step 4: Compute the ANOVA

*(I-square mo lahat ng sum tapos iadd sila together ) minus (add mo sa lahat si sum tapos isquare mo divided by total population)

XA XB XC XD

5 9 3 2

4 7 5 3

8 8 2 4

XA2 XB

2 XC2 XD

2

25 81 9 4

16 49 25 9

64 64 4 16

Analysis of Variance (ANOVA)

MATH 009

Determine the between-column sum of squares (SSB) defined by the formula:b.

*(Summation of squared totals divided by number of rows) - (Sum of the totals squared over total population)

XA XB XC XD

5 9 3 2

4 7 5 3

8 8 2 4

Compute the within column-variance or within-column sum of squares defined by,c.

Construct an analysis of variance table (ANOVA table) as shown below:d.

ANOVA table on the three samples subjected to different tables

Source of Variation Sum of Squares Df MSS = SS / Df

Between-column 48.67 3 16.22

Within-column 17.33 8 2.17

Total 66.00 11

*between column df = columns (k) - 1 = 4 - 1 = 3*total column df =( rows * column ) - 1 = 12 - 1 = 11*within column df = total column - within column = 8*total = SSB + SSW = 48.67 + 17.33 = 66.00*between column MSS = SSB / DF = 48.67 / 3 = 16.22*within-column MSS = SSW / DF = 17.33 / 8 = 2.17

Compute the F-test (Fisher) formula:e.

*

Locate the tabular value of F.f.*To do this, use

1. 2.

Decision Rules:a.

FTV = 4.07

MATH 009

2.

MATH 009

Chi-Squared Test* to test how likely it is than an observed distribution is due to chance. * Goodness of fit statistic or test of independence*significant relationship between two variables

When to use:1. Random sampling method is used.2. Each population is at least 10 times as large as its respective sample.3. Variables understudy are categorical.4. The expected frequency count for each cell of the table is at least 5.

Test the hypothesis that educational attainment does not depend on socio-economic status for the following 100 persons in a particular community.

Finished College Did not finished College Total

Poor 18 10 28

Middle Class 28 24 52

Rich 14 6 20

Total 60 40 100

Step 1: State null (H0).*H0 (Variable A and B are independent)

Null Hypothesis (HO) = Socio-Economic Status is independent from Educational Attainment

Degrees of Freedom DF = (r-1) * (c-1)a.*Rows (not including total) = 3*Column (not including total) = 2

Step2. Degrees of Freedom, Expected and Observed Frequencies, Chi-Squared

DF = (r-1)*(c-1) = (3-1)*(2-1) = 2

FE (Expected Frequency) =

b.

FO (Observed Frequency) = based on the tablec.

Test Statistic X2 = ∑ (O - E)2 / Ed.

Finished College Did not finished College Total

Poor FO = 18 / FE =

FO = 10 / FE =

28

Middle Class FO =28 / FE =

FO = 24 / FE =

52

Rich FO =14 / FE =

FO = 6 / FE =

20

Total 60 40 100

Chi-Square Test

MATH 009

Step 3. Level of Significance α = 0.05

Step 4. Get the tabulated value.*To get it, use the coordinates, (level of significance, df)

MATH 009

Comparing two sample means:

=

Problem: In a study of abstract reasoning, a sample group of male and female students scored as shown below:

Gender Sample Size Mean Standard Deviation

Male 95 29.25 10.83

Female 85 30.72 8.72

Step 1: Get the Null Hypothesis (HO)*The two samples are normally independent.

HO = There is no significant difference between sample 1 and sample 2.

Step 2: Get the level of significanceα = 0.10

Step 3: Use appropriate statisticUse Z - test since population is greater than 30.

Test Statistic: Z-test

Step 4: Get tabulated value*To get tabulated value, use this coordinate (significance level, two tailed test)

TV = 1.645

Step 5: Compute for the Z-Value

Step 6: Compare Calculated Value and Tabulated ValueCV = -1.00TV = 1.645

*Decision Rule, if CV > TV, reject Ho. If CV < TV, accept Ho.

Z - Test for two population samples

MATH 009

*Decision Rule, if CV > TV, reject Ho. If CV < TV, accept Ho.

Since -1.00 < 1.645, accept Ho.

MATH 009

probability and statistics math 009 (tip reviewer)

Documents