day 3 descriptive statistics
DESCRIPTION
This slides introduce the descriptive statistics and its differences with inferential statistics. It also discusses about organizing data and graphing data.TRANSCRIPT
Sunday, April 9, 2023 1
Refers to methods and techniques used for describing, organizing, analyzing, and interpreting numerical data.
The field of statistics is often divided into two broad categories : descriptive statistics and inferential statistics.
Descriptive statistics transform a set of numbers or observations into indices that describe or characterize the data.
Thus, descriptive statistics are used to classify, organize, and summarize numerical data about a particular group of observations.
There is no attempt to generalize these statistics, which describe only one group, to other samples or population.
In other words, descriptive statistics are used to summarize, organize, and reduce large numbers of observations.
Descriptive statistics portray and focus on what is with respect to the sample data, for example:
1.What is the average reading grade level of the fifth graders in the school?”
2.How many teachers found in-service valuable?”
3.What percentage of students want to go to college?
Inferential statistics (sampling statistics), involve selecting a sample from a defined population and studying that sample in order to draw conclusions and make inferences about the population.
100,000 fifth-grade students take an English achievement test
100,000 fifth-grade students take an English achievement test
Researcher randomly samples 1,000 students scores
Researcher randomly samples 1,000 students scores
Used to describe the sampleUsed to describe the sample
Based on descriptive statistics to estimate scores of the entire population of 100,000 students
Based on descriptive statistics to estimate scores of the entire population of 100,000 students
Focuses on ways to organize numerical data and present them visually with the use of graphs.
One way to organize your data is to create a frequency distribution.
Various software programs, such as Excel, can easily produce graphs for you.
Allows researchers and educators to describe, summarize, and report their data.
By organizing data, they can compare distributions and observe patterns.
In most cases, the original data we collect is not ordered or summarized.
Therefore, after collecting data, we may want to create a frequency distribution by ordering and tallying the scores.
A seventh-grade social studies teacher wants to assign end-of term letter grades to the twenty-five students in her class. After administering a thirty-item final examination, the teacher records the students’ test scores.
2725302419
1628241721
2326292318
2220172423
2122282625
These scores show the number of correct answers obtained by each students on the social studies final examination.Next, the researcher can create a frequency distribution by ordering and tallying these scores.
Score Frequency Score Frequency
3029282726252423
11212233
22212019181716
2211121
The researcher/teacher may want to group every scores together into class interval to assign letter grade to the students.
Class interval (5 points)
Mid point Frequency
26-3021-2516-20
282318
7126
∑ 25
A researcher of experimental research administered a thirty-item reading comprehension test. Next, the researcher records the students’ reading scores. Please, create a frequency distribution of thirty scores with class intervals of five points and interval midpoints.
748066696365
616258595758
575755565354
515249504748
314443363941
Graphs are usually to communicate information by transforming numerical data into a visual form.
Graphs allow us to see relationships not easily apparent by looking at the numerical data.
There are various forms of graphs, each are appropriate for a different type of data.
In drawing histogram and frequency polygon, the vertical axis always represents frequencies, and the horizontal axis always represents scores or Class interval (Mid point).
The lower values of both vertical and horizontal axes are recorded at the intersection of the axes (at the bottom left side).
Lowest Highest
Lowest
Highest
Frequency distribution in the following table can be depicted using two types of graphs, a histogram or a frequency polygon.
Score Frequency
654321
124321
A Frequency Distribution of Twenty-five Scores with class Intervals and Midpoints
Class Interval Midpoint Frequency
38-4233-3728-3223-2718-2213-178-123-7
403530252015105
13465321
The following data are unorganized examination score of two groups taught with different method
Group A (Language laboratorium) N=30
Group A (Language laboratorium) N=30
Group B (Non-language laboratorium) N=30
Group B (Non-language laboratorium) N=30
1512111815159
1914131112181516
141617151713141315171917181614
11161418689
1412121015129
13
16171287
155
141313121113117
The following data are unorganized examination score of two groups taught with different methoda. Arrange the frequency distribution of scores!b. Arrange interval frequency distribution of scores
of five points!c. Figure the histogram of the scores!d. Figure the frequency Polygon of the scores!e. Take a conclusion from the histogram and
frequency polygon you graph.
They are descriptive statistics that measure the central location or value of sets of scores.
A measure of central tendency is a summary score that is used to represent a distribution of scores.
It is a summary score that represents a set of scores.
They are used widely to summarize and simplify large quantities of data.
The mode of the distribution is the score that occurs with the greatest frequency in that distribution.
Score Frequency
Mode
12111098765
11234211
We can see that the score of 8 is repeated the most (four times); therefore, the mode of the distribution is 8.
The mode of the distribution is the score that occurs with the greatest frequency in that distribution.
Score Frequency
Mode
12111098765
11234211
We can see that the score of 8 is repeated the most (four times); therefore, the mode of the distribution is 8.
The mode in the distribution below is?
Score Score
1617181820
22222223
We can see that the score of 22 is repeated the most (three times); therefore, the mode of the distribution is 22.
The median is the middle point of a distribution of scores that are ordered
Fifty percent of the scores are above the median , and 50 percent are below it.
Score
Median
10876421
The score 6 is the median because there are three scores above it and three below it.
If the distribution has an even number of scores, the median is the average of the two middle scores.
Score
20161210877642
Thus, the median in the score above is (7+8):2= 7.5
Median Two middle scores
It is the “arithmetic average” of a set of scores.
It is obtained by adding up the scores and dividing that sum by the number of scores.
The statistical symbol for the mean of a sample is χ (pronounced “ex bar”).
A raw score is represented in statistics by the letter X.
A raw score is score as it was obtained on a test or any other measure, without converting it to any other scale.
The statistical symbol for the population mean is µ, the Greek letter mu (pronounced “moo” or “mew”).
The statistical symbol for “sum of” is ∑ (the capital Greek letter sigma).
The formula for calculating the mean is
or
The statistical symbol for the population mean is µ, the Greek letter mu (pronounced “moo” or “mew”).
The statistical symbol for “sum of” is ∑ (the capital Greek letter sigma).
Calculation of Mean if we have obtained the sample of eight scores : 17,14,14,13.10,8,7,7
Answer: By using raw score
Score Score
17141413
10877
∑ X= 17+14+14+13+10+8+7+7=90
N=8
Thus, the mean is
Calculation of Mean if we have obtained the sample of eight scores : 17,14,14,13.10,8,7,7
Answer: By score distribution
Score
Frequency
F x Score
1714131087
121112
17281310814
8 90
∑ X= 17+28+13+10+8+14=90
N=8
Thus, the mean is
Are used to show the differences among the scores in a distribution.
We use the term variability or dispersion because the statistics provide an indication of how different, or dispersed, the scores are from one another.
The range is the simplest; but also least useful, measure of variability.
It is defined as the distance between the smallest and the largest scores.
It is calculated by simply subtracting the bottom, or lowest, score from the top, or highest score.
Range = XH- XL
XH = the highest score
XL = the lowest score
Determine the range and the mean from the following sets of figures :a. 1,4,9,11,15,19,24,29,34b. 14,15,15,16,16,16,18,18,18
Answer a: Mean= ........ Range ...........
Answer b: Mean= ........ Range .........
The distance between each score in a distribution and the mean of that distribution is called the deviation score.
The mean of the deviation scores is called the standard deviation (SD)
The standard deviation tells you” how close the scores are to the mean.”
The SD describes the mean distance of the scores around the distribution mean.
Squaring the SD give us another index of variability, called the VARIANCE.
The Variance is needed in order to calculate the SD (Standard Deviation).
If the standard deviation is a small numbers, this tells you that the scores are “bunched together” close to the mean.
If the standard deviation is a large number, this tells you that the scores are “spread out” a greater distance from the mean.
The formula for standard deviation is:
for group scores
The variance (S2) is a measure of dispersion that indicates the degree to which scores cluster around the mean.
Computationally, the variance is the sum of the squared deviation scores about the mean divided by the total number of scores/the total number of scores minus one.
or
or
If we have only five scores. It is very likely that such a small group of scores is a sample, rather than a population. Therefore, we computed the variance and SD for these scores, treating them as a sample, and used a denominator of N-1 in the computation.When, on the other hand, we consider a set of scores to be a population, we should use a denominator of N to compute the variance.
For any distribution of scores, the variancecan be determined by following five steps:
Step 1:calculate the mean: (∑X/N)Step 2: calculate the deviation scores: Step 3: Square each deviation score : Step 4: Sum all the deviation scores: Step 5 : Divide the sum by N:
Calculate the standard deviation from the following scores: 2,3,3,4,5,5,5,6,6,8Answer: Calculate the variance by using 5 stepsStep 1:calculate the mean: (∑X/N)Step 2: calculate the deviation scores: Step 3: Square each deviation score : Step 4: Sum all the deviation scores: Step 5 : Divide the sum by N:
Raw Scores
2334555668
2-4.7=-2.73-4.7=-1.73-4.7=-1.74-4.7=-0.75-4.7=0.35-4.7=0.35-4.7=0.36-4.7=1.36-4.7=1.38-4.7=3.3
7.292.892.890.490.090.090.091.691.6910.89
28.10 28.10/10= 2.81
Thus the Standard Deviation is
Calculate the standard deviation from the following scores: 20,15,15,14,14,14,12,10,8,8Answer: Calculate the variance by using 5 stepsStep 1:calculate the mean: (∑X/N)Step 2: calculate the deviation scores: Step 3: Square each deviation score : Step 4: Sum all the deviation scores: Step 5 : Divide the sum by N: