math 103 01 basic concepts
DESCRIPTION
Learning Material in MathematicsTRANSCRIPT
1
Math 103Statistics andProbability
Basic Concepts of Statistics
CJD
Statistics
Statistics• Specific numbers that have been observed
• Observation, Presentation, Analysis and Interpretation of Chance Outcomes
Descriptive Statistics – methods concerned with collecting and describing data to yield meaningful
information.
Inferential Statistics – methods concerned with analysis
of a subset of data to predict or infer about the entire set
of data.
CJD
Descriptive Statistics
Source : http://bsp.gov.ph
Dollar - Peso Rates
0.00
10.00
20.00
30.00
40.00
50.00
60.00
1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007
Year
1$
=
CJD
Observations
Independent Variables – data held constant to determine
the values of the dependent variables
Experiment – Any process that generates a set of data
Observations – The recorded information as a result
of an experiment
Dependent Variables – data as a result of an experiment
when independent variables are fixed
2
CJD
Types of Data
Quantitative Data (Numerical)
– Discrete (countable)
ex. counts, salary, test
scores
– Continuous (no gaps)
ex. weight, time, force, distance, volume
Qualitative Data (Categorical)
– ex. Blood type,
gender, yes/no, car
model, profession
CJD
4 Scales of Data
100 kilos is
twice as heavy
as 50 kilos
Prices; Weights;
50 kilos
70 kilos
80 kilos
Like interval, but with an
inherent starting point.
Ratios are meaningful
Ratio
90°F is not
twice as hot as
45°F.
Year; Seasonal
temperatures:
50°F75°F
100°F
Differences between
values can be found, but
there may be no inherent
starting point; ratios are
meaningless.
Interval
An order is
determined by
“compact, mid,
sport utility.”
Types of Autos:
5 compact
15 mid-size
20 sport –utility
Categories are ordered,
but differences cannot be
determined or are
meaningless
Ordinal
Categories or
names only.
Blood Types; Yes/No;
Baseball Players:
5 infielders
10 outfielders
15 pitchers
Categories only. Data
cannot be arranged in
ordered sequence
Nominal
ExplanationExampleSummaryLevel
CJD
Where the data is from
Population
– Totality of all
observations we are concerned
– Parameter : a characteristic of a
population
– Census : collection of
data from every
element of population.
Sample
– Subset of Population
– Statistic – a
characteristic of a sample
CJD
Samples
Why Sample?• The population may be too big to observe
• ex. all Filipino citizens
• Costs may be prohibitive • ex. Surveys may be expensive
• Experiments may be destructive• ex. Light bulb life
Biased sampling procedures consistently overestimates or underestimates some characteristic of the population.
Use inferential statistics to generalize
information about the population based on
information obtained from the sample.
3
CJD
Example: A researcher wants to find out the average weight of 3,000 students in a college. How big must the
sample be to have a 5% margin of error ?
Slovin’s Formula
21 Ne
Nn
+
=
N = Population size
n = sample sizee = margin of error
3535.8
3000
)05.0(30001
30002
==
⋅+
=n
When used: When nothing is known about the population.Otherwise, more accurate formulas are available.
CJD
Sampling Methods
• Simple Random Sample- Eliminates possibility of a bias
- choose sample so that every subset of n observations from the population has the same
chance of being selected.
Use random numbers – using mechanical devices, tables, or computers
• Systematic Sampling – selects every k-th element
with starting point chosen at random
• Stratified Random Sampling – partition population
and select proportional random samples from each subpopulaton
• Cluster Sampling – perform simple random sampling only on randomly selected subpopulations
CJD
Simple Random Sample Example
Class of 270 students. Want a simple random sample of 10 students.
ROW 0 00157 37071 79553 31062 42411 79371 25506 69135
1 38354 03533 95514 03091 75324 40182 17302 64224
2 59785 46030 63753 53067 79710 52555 72307 10223
3 27475 10484 24616 13466 41618 08551 18314 57700
4 28966 35427 09495 11567 56534 60365 02736 32700
5 98879 34072 04189 31672 33357 53191 09807 85796
1. Number the units: Students numbered 001 to 270.
2. Choose a starting point: Row 3, 2nd column (10484…)
3. Read off consecutive numbers: (3-digit labels here)104, 842, 461, 613, 466, 416, 180, 855, 118, 314, 577, 002, 896, …
4. If number corresponds to a label, select that unit.
If not, skip it. Continue until desired sample size obtained.
Or use a computer to generate random numbers from 1 to 270.
CJD
Systematic SamplingOrder the population of units in some way, select one of
the first k units at random and then every kth unit thereafter.
College survey: Order list of rooms starting at top floor of 1st undergrad dorm. Pick one of the first 11 rooms at
random => room 3, then pick every 11th room after that.
Note: often a
good alternative
to random
sampling but
can lead to a
biased sample.
4
CJD
Stratified Random Sampling
Divide population of units into groups (called strata)
and take a simple random sample from each of the strata.
College survey: Two strata = undergrad & graduate dorms.
Take a simple
random sample
of 15 rooms from
each of the strata
for a total of 30
rooms.
Ideal: stratify
so little variability
in responses within
each of the strata.
CJD
Stratified Proportional Allocation
Example :Suppose 38 students in a class were classified based
on place of birth. 20 are from NCR, 8 from Luzon (other than NCR), 6 from the Visayas, and 4 from Mindanao.
If a sample of 10 is to be made, how many from each classification should be selected?
Solution :
NCR: 20 * (10/38) = 5.26 ≈ 5Luzon: 8 * (10/38) = 2.10 ≈ 2
Visayas: 6 * (10/38) = 1.57 ≈ 2Mindanao: 4 * (10/38) = 1.05 ≈ 1
Total = 10
CJD
Cluster Sampling
Divide population of units into groups (called clusters),
take a random sample of clusters and
measure only those items in these clusters.
College survey: Each floor of each dorm is a cluster.
Take a random sample
of 5 floors and all
rooms on those floors
are surveyed.
Advantage: need only
a list of the clusters
instead of a list of all
individuals.
CJD
Summation
40 valuesdata x all sum ==∑ x
4054321
5
1
=++++=∑=
xxxxxxi
i
51040036169492
5
2
4
2
3
2
2
2
1
5
1
2=++++=++++=∑
=
xxxxxxi
i
Data: x1 = 7x2 = 3x3 = 4x4 = 6x5 = 20
y1 = 1y2 = 3y3 = 2y4 = -1y5 = 0
∑=
=++++=
5
1
1554321i
i
1106895
2
=+−+=∑=i
ii yx
∑∑ ∑= = =
=+++++=++=
2
1
3
1
2
1
60)693()14217(231i j i
iiijixxxyx
14400)120(3 2
25
1
==
∑
=i
ix
5
CJD
Summation Theorems
∑∑∑===
+=+
n
i
i
n
i
i
n
i
ii yxyx111
)(
∑ ∑= =
=
n
i
n
i
ii xccx1 1
∑=
=
n
i
cnc1
CJD
Multidimensional Data
25x24
41x23
31x22
42x21
2nd Floor
(i=2)
30
x14
45
x13
28
x12
40
x11
1st Floor
(i=1)
4th Room
(j=4)
3rd Room
(j=3)
2nd Room
(j=2)
1st Room
(j=1)
# of students
xij
282)25413142()30452840(
)()(
)(
2423222114131211
432
2
1
4
1
2
1
1
=+++++++=
+++++++=
+++=∑∑ ∑= = =
xxxxxxxx
xxxxx iii
i j i
iij
CJD
Exercise
∑
∑
∑
=
=
=
−
=
=
5
1
2
5
1
2
5
1
3 evaluate
300 and
50 if
i
i
i
i
i
i
)(x
x
x
45)5(9)50(6300
96
)96(3
5
1
5
1
5
1
2
5
1
25
1
2
=+−=
+−=
+−=−
∑ ∑ ∑
∑∑
= = =
==
i i i
ii
i
ii
i
i
xx
xx)(x
CJD
End