hss2381a – stats... or whatever univariate analysis, part 1 descriptive statistics

54
hss2381A – stats... or whatever Univariate Analysis, part 1 Descriptive Statistics

Upload: jason-paul

Post on 18-Dec-2015

237 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Hss2381A – stats... or whatever Univariate Analysis, part 1 Descriptive Statistics

hss2381A – stats... or whatever

Univariate Analysis, part 1Descriptive Statistics

Page 2: Hss2381A – stats... or whatever Univariate Analysis, part 1 Descriptive Statistics

• Evidence-based practice (EBP): Use of best clinical evidence in making patient care decisions

• Best source of evidence: Systematic research

Page 3: Hss2381A – stats... or whatever Univariate Analysis, part 1 Descriptive Statistics

• How reliable is the evidence? • What is the magnitude of effects? • How precise is the estimate of effects?

• Answering these questions requires an understanding of statistics

Evidence-Based Medicine (EBM) or Evidence-Based Practice (EBP) Questions:

Page 4: Hss2381A – stats... or whatever Univariate Analysis, part 1 Descriptive Statistics

Data and Data Analysis

• In the context of a study, the information gathered to address research questions is data

• In quantitative research, data are usually quantitative (numbers)

• Quantitative data are subjected to statistical analysis

Page 5: Hss2381A – stats... or whatever Univariate Analysis, part 1 Descriptive Statistics

Examples of Independent and Dependent Variables

• Independent variable (IV): Smoking• Dependent variable (DV): Lung cancer IV DV ?

Page 6: Hss2381A – stats... or whatever Univariate Analysis, part 1 Descriptive Statistics

Research Question

• Research questions are the queries researchers seek to answer through the collection and analysis of data

• Research questions communicate the research variables and the population (the entire group of interest)

– Example: In hospitalized children (population) does music (IV) reduce stress (DV)?

Page 7: Hss2381A – stats... or whatever Univariate Analysis, part 1 Descriptive Statistics

Defining a Variable

• Two phases:– Conceptual

– operational

Page 8: Hss2381A – stats... or whatever Univariate Analysis, part 1 Descriptive Statistics

Defining a Variable

• In studies, variables need to be defined• Conceptual definition: The theoretical

meaning of the underlying concept• Operational definition: The precise set of

operations and procedures used to measure the variable

Example:Concept = how long have you been on this planet?Operation = In what age group, by years, are you in?

Page 9: Hss2381A – stats... or whatever Univariate Analysis, part 1 Descriptive Statistics

Descriptive Statistics

• Researchers collect their data from a sample of study participants—a subset of the population of interest

• Descriptive statistics describe and summarize data about the sample

– Examples: Percent female in the sample, average weight of participants

Page 10: Hss2381A – stats... or whatever Univariate Analysis, part 1 Descriptive Statistics

Inferential Statistics

• Researchers obtain data from a sample but often want to draw conclusions about a population

• Parameter: A descriptive index for a population– Example: Average daily caloric intake of all 10-year-old

children in New York

• Statistic: A descriptive index for a sample– Example: Average daily caloric intake of 300 10-year-old

children from three particular NY schools

Page 11: Hss2381A – stats... or whatever Univariate Analysis, part 1 Descriptive Statistics

SPSS and Statistical Analysis

• SPSS (Statistical Package for the Social Sciences) is among the most popular statistical software packages for analyzing research data

• It is user friendly and menu driven• The datasets offered with this textbook are

set up as SPSS files

Page 12: Hss2381A – stats... or whatever Univariate Analysis, part 1 Descriptive Statistics

The Data Editor in SPSS

• The data editor in SPSS offers a convenient spreadsheet-like method of creating, editing, and viewing data

• There are two “views” within the data editor: – Data View: Shows the actual data values– Variable View: Shows variable information for all

variables

Page 13: Hss2381A – stats... or whatever Univariate Analysis, part 1 Descriptive Statistics

Data View in the Data Editor

• The columns represent one variable each; unique variable names (no more than eight characters long) are shown at the top of each column

• Each row is a case, representing an individual participant

• The data view tab is at the bottom

Page 14: Hss2381A – stats... or whatever Univariate Analysis, part 1 Descriptive Statistics

Variable View in the Data Editor

• Variable View shows a wealth of information about how variables are coded, how they will be labeled in output, level of measurement, and so on

• The Variable View tab is at the bottom

Page 15: Hss2381A – stats... or whatever Univariate Analysis, part 1 Descriptive Statistics

Versions of SPSS

• New versions of SPSS are created regularly, to offer improved options for analysis and presentation

• Examples in this book were created in SPSS Version 16.0

• The student version of SPSS is available for analyzing relatively small datasets (no more than 50 variables and no more than 1,500 cases)

Page 16: Hss2381A – stats... or whatever Univariate Analysis, part 1 Descriptive Statistics

What is this?FREQUENCY DISTRIBUTION

Same as Histogram?

Page 17: Hss2381A – stats... or whatever Univariate Analysis, part 1 Descriptive Statistics

Frequency Distributions

• A frequency distribution is a systematic arrangement of data values, with a count of how many times each value occurred in a dataset

• You can portray this as a table or as a graph

Page 18: Hss2381A – stats... or whatever Univariate Analysis, part 1 Descriptive Statistics

Constructing a Frequency Distribution

• List each data value in a sequence (usually, ascending order) 1, 2, 3, 4, 5…

• Tally each occurrence of the value • Total the frequencies for each value (f) • The sum of fs for all data values must equal

the sample size:Σf = N

Page 19: Hss2381A – stats... or whatever Univariate Analysis, part 1 Descriptive Statistics

Elements of a Typical Frequency Distribution

• Data values• Absolute frequencies (counts)• Relative frequencies (percentages) • Cumulative relative frequencies (the

percentage for a given score value, combined with percentages for all preceding values)

Page 20: Hss2381A – stats... or whatever Univariate Analysis, part 1 Descriptive Statistics

Example...• Let’s say we have 10 people of varying ages:

– Ages: 17, 26, 33, 35, 14, 55, 67, 35, 21, 19

• Let’s construct the frequency distribution of the age GROUPS: 0-25 yrs, 26-45 yrs, >45 yrs

Age group

0-25

26-45

>45

Frequency

4

4

2

Relative Freq.

4/10 = 40%

4/10 = 40%

2/10 = 20%

Cumulative Freq.

40%

40+40% = 80%

80+20% = 100%

Page 21: Hss2381A – stats... or whatever Univariate Analysis, part 1 Descriptive Statistics

Summary of Our ExampleData Value Frequency

(f)

Percentage

(%)

Cumulative Percentage

0-25 4 40.0 40.0

26-45 4 40.0 80.0

>45 2 20.0 100.0

TOTAL 10 100.0

Page 22: Hss2381A – stats... or whatever Univariate Analysis, part 1 Descriptive Statistics

Frequency Distributions and Measurement Levels

• Remember “measurement levels”?– Nominal, ordinal, interval, ratio...

• Frequency distributions can be constructed for variables measured at any level of measurement

• BUT…for categorical (nominal-level) variables, cumulative frequencies do not make sense

• Also...

Page 23: Hss2381A – stats... or whatever Univariate Analysis, part 1 Descriptive Statistics

Frequency Distributions for Variables with Many Values

• When a variable has many possible values, a regular frequency distribution may be unwieldy – For example, weight

values (here, in pounds)

Weight f

98 1

99 1

100 1

101 0

102 2

103 1

104 0

105 2

106 1

Etc. to 285 lb

Page 24: Hss2381A – stats... or whatever Univariate Analysis, part 1 Descriptive Statistics

Which is Why We Used “Age Group” instead of “Age”

• This is sometimes called a “grouped frequency distribution”

• In a grouped frequency distribution contiguous values are grouped into sets (class intervals)

• Typically, we use groupings that are psychologically appealing (e.g., 10-25 years etc, not 7-13 years, etc)

Page 25: Hss2381A – stats... or whatever Univariate Analysis, part 1 Descriptive Statistics

Weight f

98 1

99 1

100 1

101 0

102 2

103 1

104 0

105 2

106 1

Etc. to 285 lb

Weight Interval

f

75 - 100 6

101 - 125 15

126 - 150 33

151 - 175 26

176 - 200 24

201 - 225 14

226 - 250 9

251 - 275 6

276 - 300 2

This grouping communicates information more conveniently than individual weights

Page 26: Hss2381A – stats... or whatever Univariate Analysis, part 1 Descriptive Statistics

Reporting Frequency Information

• Can be reported narratively in text (e.g., “83% of study participants were male”)

• In a frequency distribution table (multiple variables often presented in a single table)

• In a graph: Different graphs used for different types of data

Page 27: Hss2381A – stats... or whatever Univariate Analysis, part 1 Descriptive Statistics

Bar Graphs

• Bar graphs: Used for nominal (and many ordinal) level variables

• Bar graphs have a horizontal dimension (X axis) that specifies categories (i.e., data values)

• The vertical dimension (Y axis) specifies either frequencies or percentages

• Bars for each category drawn to the height that indicates the frequency or %

Page 28: Hss2381A – stats... or whatever Univariate Analysis, part 1 Descriptive Statistics

Bar Graphs

• Example of a bar graph

• Note the bars do not touch each other

Page 29: Hss2381A – stats... or whatever Univariate Analysis, part 1 Descriptive Statistics

Pie Chart

• Pie Charts: Also used for nominal (and many ordinal) level variables

• Circle is divided into pie-shaped wedges corresponding to percentages for a given category or data value

• All pieces add up to 100%• Place wedges in order, with biggest wedge

starting at “12 o’clock”

Page 30: Hss2381A – stats... or whatever Univariate Analysis, part 1 Descriptive Statistics

Pie Chart

• Example of a pie chart, for same marital status data

Page 31: Hss2381A – stats... or whatever Univariate Analysis, part 1 Descriptive Statistics

Histograms

• Histograms: Used for interval- and ratio-level data

• Similar to a bar graph, with an X and Y axis—but adjacent values are on a continuum so bars touch one another

• Data values on X axis are arranged from lowest to highest

• Bars are drawn to height to show frequency or percentage (Y axis)

Page 32: Hss2381A – stats... or whatever Univariate Analysis, part 1 Descriptive Statistics

HistogramsExample of a histogram: Heart rate data

0

2

4

6

8

10

12

0 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74

Heart rate in bpm

f

Page 33: Hss2381A – stats... or whatever Univariate Analysis, part 1 Descriptive Statistics

Frequency Polygons

• Frequency polygons: Also used for interval- and ratio-level data

• Similar to histograms, but instead of bars, a dot is used above score values to designate frequency/percentage

• Better than histograms for showing shape of distribution of scores, and is usually preferred if variable is continuous

Page 34: Hss2381A – stats... or whatever Univariate Analysis, part 1 Descriptive Statistics

•Example of a frequency polygon (created in SPSS)

•Note that the line is brought down to zero for the score below lowest data point (54) and above highest data point (75)

Frequency Polygon, Heart Rate

Heart Rate in bpm

7472706866646260585654

Freq

uenc

y

12

10

8

6

4

2

0

Page 35: Hss2381A – stats... or whatever Univariate Analysis, part 1 Descriptive Statistics

Shapes of Distributions

• Distributions of data values can be described in terms of:– Modality – Symmetry– Kurtosis

Page 36: Hss2381A – stats... or whatever Univariate Analysis, part 1 Descriptive Statistics

Modality

• Modality concerns how many peaks (values with high frequencies) there are

• Unimodal = 1 peak• Bimodal = 2 peaks• Multimodal = multiple peaks

Page 37: Hss2381A – stats... or whatever Univariate Analysis, part 1 Descriptive Statistics

Unimodal:

Bimodal:

Page 38: Hss2381A – stats... or whatever Univariate Analysis, part 1 Descriptive Statistics

How is this useful?

Page 39: Hss2381A – stats... or whatever Univariate Analysis, part 1 Descriptive Statistics

Example: Tuberculosis

• What is it?• We apply tuberculin skin test (also called PPD

– purified protein derivative) test• Positive response is an “induration”

– a hard, raised area with clearly defined margins at and around the injection site

Page 40: Hss2381A – stats... or whatever Univariate Analysis, part 1 Descriptive Statistics

What type of curve is this?

Page 41: Hss2381A – stats... or whatever Univariate Analysis, part 1 Descriptive Statistics

Distribution of systolic blood pressure for men (unimodal distribution)

Page 42: Hss2381A – stats... or whatever Univariate Analysis, part 1 Descriptive Statistics

Symmetry

• Symmetric Distribution: the two halves of the distribution, folded over in the middle, are identical

Page 43: Hss2381A – stats... or whatever Univariate Analysis, part 1 Descriptive Statistics

Symmetry

• Asymmetric (Skewed) Distribution: Peaks are “off center” and there is a tail trailing off for data values with low frequency– Positive skew: Longer tail trails off to right (fewer

people with high values, like for income)– Negative skew: Longer tail trails off to left (fewer

people with low values, like age at death)

Page 44: Hss2381A – stats... or whatever Univariate Analysis, part 1 Descriptive Statistics

Direction of Skew

• Examples of distributions with different skews:

Page 45: Hss2381A – stats... or whatever Univariate Analysis, part 1 Descriptive Statistics

Skewness Index• Indexes have been developed to quantify

degree of skewness• One skewness index (e.g., in SPSS) has:

– Negative values, for a negative skew – 0, for no skew – Positive values, for a positive skew

• If skewness index is less than twice the value of its standard error (to be explained later), distribution can be treated as not skewed

Page 46: Hss2381A – stats... or whatever Univariate Analysis, part 1 Descriptive Statistics

Skewness Index Examples

• Skewness index = 0.80 • Standard error = 0.33

• Positive skew

• Skewness index = -0.72• Standard error = 0.34

• Negative skew

Page 47: Hss2381A – stats... or whatever Univariate Analysis, part 1 Descriptive Statistics

Kurtosis

• Kurtosis: Degree of pointedness or flatness of the distribution’s peak

• Leptokurtic: Very thin, sharp peak• Platykurtic: Flat peak• Mesokurtic: Neither pointy nor flat

– Like skewness, there is an index of kurtosis• Positive values: Greater peakedness• Negative values: Greater flatness

Page 48: Hss2381A – stats... or whatever Univariate Analysis, part 1 Descriptive Statistics

Kurtosis Examples

Leptokurtic (+ index)

Platykurtic (– index)

Page 49: Hss2381A – stats... or whatever Univariate Analysis, part 1 Descriptive Statistics

Normal Distribution

What is this curve called?

Page 50: Hss2381A – stats... or whatever Univariate Analysis, part 1 Descriptive Statistics

Normal Distribution

• A normal distribution (aka normal curve, bell curve, Gaussian distribution, etc) is:

– Unimodal– Symmetric– Neither peaked nor flat

• Plays an important role in inferential statistics

We will re-visit the Normal Distribution in more depth in the future

Page 51: Hss2381A – stats... or whatever Univariate Analysis, part 1 Descriptive Statistics

Some human characteristics are normally distributed (approximately), like height

1 short person, 3 medium persons, 1 tall person

Page 52: Hss2381A – stats... or whatever Univariate Analysis, part 1 Descriptive Statistics

Uses of Frequency Distributions in Data Analysis

• First step in understanding your data!– Begin by looking at the frequency distributions for

all or most variables, to “get a feel” for the data– Through inspection of frequency distributions,

you can begin to assess how “clean” the data are• (will discuss next time)

Page 53: Hss2381A – stats... or whatever Univariate Analysis, part 1 Descriptive Statistics

Central Tendency

• “Central Tendency” is a characteristic of a distribution– Describes how data is clustered around some value– In other ways, it’s a way of summarizing your data by

identifying one value in the set that is the most important

– There are several indices of central tendency, but 3 are the most important:

• Mode• Median• Mean

Next class, we’ll get into these in more depth

Page 54: Hss2381A – stats... or whatever Univariate Analysis, part 1 Descriptive Statistics

Homework!

• P.17: A1-A4• P.36: A1-A5