chapter 2: statistics of one variable grade 12 data management
TRANSCRIPT
Chapter 2: Chapter 2: Statistics of One Statistics of One
VariableVariableGrade 12 Data ManagementGrade 12 Data Management
Data Analysis With Data Analysis With GraphsGraphs
StatisticsStatistics is the gathering, organization, is the gathering, organization, analysis, and presentation of numerical analysis, and presentation of numerical informationinformation
Unprocessed information collected for a Unprocessed information collected for a study is called study is called raw dataraw data
The quantity being measured is a The quantity being measured is a variablevariable A A continuous variablecontinuous variable can have any value can have any value
within a given rangewithin a given range A A discrete variablediscrete variable can have only certain can have only certain
separate values (often integers)separate values (often integers)
Frequency tablesFrequency tables and and frequency frequency diagramsdiagrams can give a convenient can give a convenient overview of the distribution of values of overview of the distribution of values of the table and reveal trends in the datathe table and reveal trends in the data
A A histogramhistogram is a special form of is a special form of bar bar graphgraph in which the area of the bars are in which the area of the bars are proportional to the proportional to the frequenciesfrequencies of the of the values of the variablevalues of the variable Connected represents continuous range of Connected represents continuous range of
valuesvalues
A A frequency polygonfrequency polygon can illustrate can illustrate the same information as a histogram the same information as a histogram or bar graphsor bar graphs
Example 1: Frequency Tables and Example 1: Frequency Tables and DiagramsDiagrams
Here are the sum of two numbers from Here are the sum of two numbers from 50 rolls of a pair of standard dice.50 rolls of a pair of standard dice.
Use a graph to illustrate the information Use a graph to illustrate the information in the frequency table.in the frequency table.
0
1
2
3
4
5
6
7
8
9
10
2 3 4 5 6 7 8 9 10 11 12
Sum
Fre
qu
ency
0
1
2
3
4
5
6
7
8
9
10
2 3 4 5 6 7 8 9 10 11 12
Sum
Fre
qu
ency
Bar Graph Frequency Polygon
Create a cumulative – frequency table Create a cumulative – frequency table and graph for the dataand graph for the data
46
9
18
26
31
37
42
4649 50
0
10
20
30
40
50
60
2 3 4 5 6 7 8 9 10 11 12
Sum
Cu
mu
lati
ve F
req
uen
cy
Example of HistogramExample of Histogram
Frequency Polygon maybe superimposed ontothe same grid as thehistogram.
Setting up Frequency – Setting up Frequency – distribution tabledistribution table
The values in the intervals should not The values in the intervals should not overlap, otherwise, a value belonging to two overlap, otherwise, a value belonging to two intervals would create a consistency errorintervals would create a consistency error For example, suppose an individual is 38 years For example, suppose an individual is 38 years
old, would that individual be placed in the “33 – old, would that individual be placed in the “33 – 38” or “38 – 42” interval? It would be impossible 38” or “38 – 42” interval? It would be impossible to determine and inconsistentto determine and inconsistent
Incorrect Correct
IndicesIndices An An indexindex relates the value of a variable relates the value of a variable
(or a group of variables) to a base label, (or a group of variables) to a base label, which is often the value on a particular which is often the value on a particular datedate CPI, TSE 300CPI, TSE 300
Time-series graphsTime-series graphs are often used to are often used to show how indices change over timeshow how indices change over time
Determine rate of change by determining Determine rate of change by determining slope using two points on the graphslope using two points on the graph Rate of change = rise/runRate of change = rise/run
Sampling TechniquesSampling Techniques
PopulationPopulation refers to all individuals refers to all individuals who belong to a group being studiedwho belong to a group being studied
SampleSample refers to the segment of the refers to the segment of the population used in a studypopulation used in a study
Sampling frameSampling frame refers to the group refers to the group of individuals who actually have a of individuals who actually have a chance of being selectedchance of being selected
Simple Random SampleSimple Random Sample
Every member of the population has Every member of the population has an equal chance of being selected an equal chance of being selected and the selection of any particular and the selection of any particular individual does not affect the individual does not affect the chances of any other individual chances of any other individual being chosenbeing chosen
Systematic SampleSystematic Sample
In a In a systematic samplesystematic sample, go through the , go through the population sequentially and select members population sequentially and select members at regular intervalsat regular intervals The sample size and the population size The sample size and the population size
determine the sampling intervaldetermine the sampling interval Interval = population size/sample sizeInterval = population size/sample size Suppose a study determines the interval to Suppose a study determines the interval to
be 3040, an individual may be selected from be 3040, an individual may be selected from any of the first 3040 individuals, and select any of the first 3040 individuals, and select every 3040every 3040thth individual from that point on individual from that point on
Stratified SampleStratified Sample
If a population includes groups of If a population includes groups of members who share common members who share common characteristics, such as gender, age, characteristics, such as gender, age, or education levelor education level Such a group are called Such a group are called stratastrata
A A stratified samplestratified sample has the same has the same proportion of members from each proportion of members from each stratum as the population doesstratum as the population does
Example 2: Designing a Stratified Example 2: Designing a Stratified Sample Sample
Before booking bands for the high school Before booking bands for the high school dances, the students’ council at Statsville dances, the students’ council at Statsville High School wants to survey the music High School wants to survey the music preferences of the student body. The preferences of the student body. The following table shows the enrolment at following table shows the enrolment at the school. Design a stratified sample for the school. Design a stratified sample for a survey of 25% of the student body.a survey of 25% of the student body.
Example 2: SolutionExample 2: Solution
To obtain a stratified sample with To obtain a stratified sample with the correct proportions, simple the correct proportions, simple select 25% of the students in each select 25% of the students in each grade levelgrade level
Other Sampling Other Sampling TechniquesTechniques
Cluster SampleCluster Sample – If certain groups are – If certain groups are likely to be representative of the entire likely to be representative of the entire population, you can use a random selection population, you can use a random selection of such groups as cluster sampleof such groups as cluster sample
Multi – Stage SampleMulti – Stage Sample – Uses several – Uses several levels of random samplinglevels of random sampling
Voluntary – Response SampleVoluntary – Response Sample – – Researcher simply invites any member of Researcher simply invites any member of the population to participatethe population to participate
Convenience SampleConvenience Sample – Often, a sample is – Often, a sample is selected simply because it is easily selected simply because it is easily accessibleaccessible
Measures of Central Measures of Central TendencyTendency
Often convenient to use a central Often convenient to use a central value to summarize a set of datavalue to summarize a set of data
Various methods exists to find values Various methods exists to find values around which a set of data tends to around which a set of data tends to clustercluster These are known as These are known as measures of measures of
central tendencycentral tendency
MeanMean Commonly referred to as “average”Commonly referred to as “average” Population meanPopulation mean (N = Entire (N = Entire
population)population)µ = xµ = x11 + x + x22 + … + x + … + xNN
NNµ = ∑ xµ = ∑ x
NN Sample meanSample mean (n = Sample size)(n = Sample size)
x = xx = x11 + x + x22 + … + x + … + xnn
nnx = ∑ xx = ∑ x
nn
The The medianmedian is the middle value of the is the middle value of the data when they are ranked from data when they are ranked from highest to lowesthighest to lowest When there is an even number of values, When there is an even number of values,
the median is the midpoint between the the median is the midpoint between the two middle valuestwo middle values
The The modemode is the value that occurs most is the value that occurs most frequently in the distributionfrequently in the distribution Some distributions do not have a mode, Some distributions do not have a mode,
while others have severalwhile others have several
Weighted MeanWeighted Mean
A weighted mean gives a measure of A weighted mean gives a measure of central tendency that reflects the central tendency that reflects the relative importance of the data:relative importance of the data:
xxww = w = w11xx11 + w + w22xx22 + … + w + … + wnnxxnn
ww11 + w + w22 + … w + … wnn
x = ∑ wx = ∑ wiixxii
∑ ∑ wwii
Differs from standard mean calculation because it gives a Differs from standard mean calculation because it gives a
stronger weight (importance) to certain categories stronger weight (importance) to certain categories
Example 3: Weighted Example 3: Weighted MeanMean
The HR manager for Statsville The HR manager for Statsville Marketing Limited considers five Marketing Limited considers five criteria when interviewing a job criteria when interviewing a job applicant. The manager gives each applicant. The manager gives each applicant a score between 1 and 5 in applicant a score between 1 and 5 in each category, with 5 being the highest each category, with 5 being the highest score. Each category has a weighting score. Each category has a weighting between 1 and 3. The following table between 1 and 3. The following table lists a recent applicant’s score and the lists a recent applicant’s score and the company’s weighting factors.company’s weighting factors.
Determine the weighted mean score Determine the weighted mean score for this job applicant.for this job applicant.
Example 3: SolutionExample 3: Solution
xxww = 2(4) + 2(2) + 3(5) + 3(5) + 1(4) = 2(4) + 2(2) + 3(5) + 3(5) + 1(4)2 + 2 + 3 + 3 + 12 + 2 + 3 + 3 + 1
xxww = 8 + 4 + 15 + 15 + 4 = 8 + 4 + 15 + 15 + 41111
xxww = 46 = 46 1111
xxww = 4.2 = 4.2Therefore, applicant has a weighted mean of approx. 4.2.Therefore, applicant has a weighted mean of approx. 4.2.
Grouped DataGrouped Data When a set of data has been grouped into intervals, it When a set of data has been grouped into intervals, it
is possible to approximate the mean using the is possible to approximate the mean using the formula:formula:
Population meanPopulation meanµ = ∑ fµ = ∑ fiimmii
∑ ∑ ffii Sample meanSample mean
x = ∑ fx = ∑ fiimmii
∑ ∑ ffii Where mWhere mii is the midpoint value of an interval and fi is the midpoint value of an interval and fi
the frequency for that intervalthe frequency for that interval Estimate the median for grouped data by taking the Estimate the median for grouped data by taking the
midpoint of the interval within which the median is midpoint of the interval within which the median is foundfound
Example 4: Calculate the Example 4: Calculate the Mean and Median for Mean and Median for
Grouped DataGrouped DataA group of children were asked how many A group of children were asked how many hours a day they spend watching television. hours a day they spend watching television. The table at the right summarizes their The table at the right summarizes their response. Determine the mean and median response. Determine the mean and median number of hours for this distribution.number of hours for this distribution.
Example 4: SolutionExample 4: Solution
x = ∑ fx = ∑ fiimmii
∑ ∑ ffii
x =x = 49491818
x = 2.7x = 2.7Therefore, the mean time the children Therefore, the mean time the children spent watching television is approximately spent watching television is approximately 2.7 h a day.2.7 h a day.
It should be It should be notednoted the values for the the values for the mean and median are approximate mean and median are approximate because where the data lie within because where the data lie within each interval cannot be accurately each interval cannot be accurately determineddetermined
Example 5: Determine the Example 5: Determine the error of the Frequency – error of the Frequency –
Distribution TableDistribution Table Explain the problem with the Explain the problem with the
intervals in the following table.intervals in the following table.
Missing values between intervalsMissing values between intervals
HomeworkHomework
Page 101 #1a, 2, 3ab, 8, 12, 15Page 101 #1a, 2, 3ab, 8, 12, 15 Page 117 #1, 2, 3, 7, 9Page 117 #1, 2, 3, 7, 9 Page 133 #1 – 9 , 11, 12, 14Page 133 #1 – 9 , 11, 12, 14 Page 148 – 150 #1, 6a, 7a, 10, 19Page 148 – 150 #1, 6a, 7a, 10, 19
Reminders:Reminders: Mid – Term Exam (Thursday)Mid – Term Exam (Thursday) Chapter 2 Quiz (Entire chapter, next Chapter 2 Quiz (Entire chapter, next
Monday)Monday)