quants
DESCRIPTION
TRANSCRIPT
What is a Variable?
• any entity that can take on different values
• not always 'quantitative' or numerical, but we can assign numerical values
• attribute = a specific value of a variableExamples:
• gender: 1=female; 2=male• attitudes: 1 = strongly disagree; 2 = disagree; 3 =
neutral; 4 = agree; 5 = strongly agree
Coding in a data matrix
Gender: Male = 1; Female=2
Political Orientation: Traditionalist=1; Moderate=2; Progressive=3
Social Class: Working=1; Upper working=2; Lower middle=3; Middle=4; Upper middle=5
Coding in a data matrix
Levels of Measurement
• different kinds of variables
(1) Nominal
(2) Ordinal
(3) Interval and Ratio
Nominal Variable
• used to classify things• represents equivalence (=)• adding, subtracting, multiplying or dividing
nominal numbers is meaningless • tells you how many categories there are in
the scheme
Ordinal Variable
• ordering or ranking of the variable• the relationship between numbered items• ‘higher’, ‘lower’, ‘easier’, ‘faster’, ‘more
often’• equivalence (=) and relative size (greater
than) and < (less than)
Interval (and Ratio) Variable
• All arithmetical operations are allowed• intervals between each step are of equal
size• Examples:
- length, weight, elapsed time, speed, temperature
Women’s Shoe Sizes
British 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8
European 34 35 35.5 36 37 37.5 38 38.5 39 39.5 40 41 42
American 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 10 10.5
Japanese (cm)
21.5 22 22.5 23 23 23.5 24 24 24.5 25 25.5 26 26.5
Levels of measurement
Level are names
have an inherent order
from more to less or higher to lower
are numbers with equal intervals between them
Nominal level
Ordinal level
Interval level
Frequency distributions
• count number of occurrences that fall into each category of each variable
• allow you to compare information between groups of individuals
• also allow you to see what are the highest and lowest values and the value at which most scores cluster
• variables of any level of measurement can be displayed in a frequency table
Frequency table
Percentages• number of cases belonging to particular category divided
by the total number of cases and multiplied by 100.• the total of percentages in any particular group equals 100
per cent.
100% N
f
Graphical presentation
• Pie charts• Barcharts• Line graphs• Histograms
Pie chart
• illustrates the frequency (or percentage) of each individual category of a variable relative to the total.
• Pie charts are not appropriate for displaying
quantitative data. Gender of Sociology Students
81%
19%
Female
Male
SociologyStudents
Female 26Male 6
15
Barcharts
• the height of the bar is proportional to the category of the variable - easy to compare
• used for Nominal or Ordinal level variables (or discrete interval/ratio level variables with relatively few categories)
Marital Status
140
60
8575
30 35
75
0
20
40
60
80
100
120
140
160
Married Living asmarried
Single Divorced Separated Widow ed Missing
Multiple barchart
Marital status
0
20
40
60
80
100
120
140
160
Married Living asmarried
Single Divorced Separated Widow ed
Fre
qu
enci
es
1995
2000
Compound or Component barchart
Sociology Students
41 42
2639
0
20
40
60
80
100
2001 2002Year
Freq
uenc
y Male
Female
Line graphs
• interval/ratio level variables that are discrete• need to arrange the values in order
YEAR
20012000199919981997199619951994199319921991
Va
lue
PR
OD
UC
T
170
160
150
140
130
120
110
Histograms
• represents continuous quantitative data• The height of the bars corresponds to the
frequency or percentage of cases in the class.• The width of the bars represents the size of the
intervals of the variable• The horizontal axis is marked out using the mid
points of class intervals
Example: Histogram
MARKS
85.0
80.0
75.0
70.0
65.0
60.0
55.0
50.0
45.0
40.0
35.0
30.0
25.0
30
20
10
0
Std. Dev = 11.83
Mean = 55.4
N = 100.00
Graphs have the capacity to distort
YEAR
20012000199919981997199619951994199319921991
Va
lue
PR
OD
UC
T
170
160
150
140
130
120
110
YEAR
200120001999199819971996199519941993199219911990V
alu
e P
RO
DU
CT
200
100
0
Measures of Central Tendency
• describe sets of numbers briefly, yet accurately • describe groups of numbers by means of other,
but fewer numbers• Three main measures:
• mean• median• mode
The Mean
• most common type of average that is computed.
When to use the Mean
• When values in a particular group cluster closely around a central value, the mean is a good way of indicating the ‘typical’ score, i.e. it is truly representative of the numbers.
• If the values are very widely spread, are very unevenly distributed, or clustered around extreme values, than the mean can be misleading, and other measures of central tendency should be used instead.
The Median
• Also an average, but of different kind.• It is defined as the midpoint in a set of scores. It
is the point at which one-half, or 50% of the scores fall above and one-half, or 50%, fell below.
• Computing the Median:(1) List the scores in order, either from highest
to lowest or lowest to highest.
(2) Find the middle score. That’s the median.
The Median: Pros and Cons
• time-consuming• if one of the numbers near the middle of the distribution
moves even slightly, than the median would alter, unlike the mean, which is relatively unaffected by a change in one of the central numbers
• if one of the extreme values changes, than the median remains unaltered.
- 2, 80, 100, 120, 130, 140, 160, 200, 3150• single scores which are quite clearly ‘deviant’ when
compared with others, are known as outliers – 2 and 3150
The Mode
• the value in any set of scores that occurs most often
• example 1: – 5, 6, 7, 8, 8, 8, 9, 10, 10, 12 – the mode = 8
• example 2: – 5, 6, 7, 8, 8, 8, 9, 10, 10, 10, 12 –two modes: 8 and
10 – bimodal
• very unstable figure – 1,1,6,7,8,10 – mode = 1– 1,6,7,8,10,10 – mode = 10
When to Use What?
• depends on the type of data that you are describing
– for nominal data - only the mode– for ordinal data - mode and median– for interval data - all of them
• but, for extreme scores - use the median
Measure of dispersion (spread)
• better impression of a distribution’s shape• measures indicate how widely scattered
the numbers are• how different scores are from one
particular score – the mean• variability - a measure of how much each
score in a group of scores differs from the mean
The range
• tells us over how many numbers altogether a distribution is spread
lhr • where
• r is the range• h is the highest score in the data set• l is the lowest score in the data set.
12 13 12 1114 13 12 10 10 11
55
0
10
20
30
40
50
60
0 2 4 6 8 10 12
r = biggest value - smallest value = 55-10 = 45
The mean deviation
• number which indicates how much, on average, the scores in a distribution differ from a central point, the mean.
Mean deviation =
160
580
2
738
6
734
51
689
0
100
200
300
400
500
600
700
800
0 2 4 6 8 10
Mean=370
-210
210
-368
368
-364
364
-319
319
X - mean= (-210)+210+(-368)+368+(-364)+364+(-319)+319 = 0
X - mean= 210+210+368+368+364+364+319+319 = 2522
mean deviation = 2522/8 = 315.25
The standard deviation (SD)
• represents the average amount of variability• It is the average distance from the mean
N
XXs
2
• s the standard deviation• find the sum of what follows• X each individual score• the mean of all the scores• N the sample sizeX
Standard deviations
Shape of Normal Distribution
Mean MedianMode
Symmetrical
Asymptotic tail
The area under the curve
• A normal distribution always has the same relative proportions of scores falling between particular values of the numbers involved.
• Areas under the curve = proportion of scores lying in the various parts of the complete distribution
SS2008N - Surveys
Median
Median
50%50%
SS2008N - Surveys
Quartiles
MedianQuartile 1 Quartile 3
25%
25%
25%
25%
Standard Deviation