psychometric instrument development

Download Psychometric instrument development

If you can't read please download the document

Upload: james-neill

Post on 21-Apr-2017

30.541 views

Category:

Education


0 download

TRANSCRIPT

Survey Design II

Lecture 6
Survey Research & Design in Psychology
James Neill, 2017
Creative Commons Attribution 4.0

Psychometric
Instrument Development

Image source: http://commons.wikimedia.org/wiki/File:Soft_ruler.jpg, CC-by-SA 3.0

7126/6667 Survey Research & Design in PsychologySemester 1, 2017, University of Canberra, ACT, AustraliaJames T. NeillHome page: http://en.wikiversity.org/wiki/Survey_research_and_design_in_psychologyLecture page: http://en.wikiversity.org/wiki/Psychometric_instrument_development

Image source:http://commons.wikimedia.org/wiki/File:Soft_ruler.jpgImage author: Lite, http://commons.wikimedia.org/wiki/User:LiteImage license: GFDL 1.2+, CC-by-SA 3.0 unported

Description: This lecture recaps the previous lecture on exploratory factor analysis, and introduces psychometrics and (fuzzy) concepts and their measurement, including (operationalisation), reliability (particularly internal consistency of multi-item measures), validity and the creation of composite scores.

This lecture is accompanied by a computer-lab based tutorial.http://en.wikiversity.org/wiki/Survey_research_and_design_in_psychology/Tutorials/Psychometrics

Overview

Recap: Exploratory factor analysis

Concepts & their measurement

Measurement error

Psychometrics

Reliability & validity

Composite scores

Writing up instrument development

Image source: http://commons.wikimedia.org/wiki/File:Information_icon4.svgLicense: Public domain

Recap:
Exploratory Factor Analysis

What is factor analysis?

Factor analysis is: a family of multivariate correlational methods used to identify clusters of covariance (called factors)

Two main purposes:Theoretical (PAF)

Data reduction (PC)

Two main types (extraction methods):Exploratory factor analysis (EFA)

Confirmatory factor analysis (CFA)

EFA steps

Test assumptions

Sample size 5+ cases x no. of variables (min.)

20+ cases x no. of variables (ideal)

Another guideline: Or N > 200

Outliers & linearity

FactorabilityCorrelation matrix: Some > .3?

Anti-image correlation matrix diags > .5

Measures of Sampling Adequacy: KMO > ~ .5 to 6; Bartlett's sig?

Image source: http://commons.wikimedia.org/wiki/File:Information_icon4.svgLicense: Public domainSee also http://en.wikiversity.org/wiki/Exploratory_factor_analysis/Data_analysis_tutorialSee also http://en.wikiversity.org/wiki/Exploratory_factor_analysis/Data_analysis_tutorial

EFA steps

Select type of analysis

ExtractionPrincipal Components (PC)

Principal Axis Factoring (PAF)

RotationOrthogonal (Varimax)

Oblique (Oblimin)

Image source: http://commons.wikimedia.org/wiki/File:Information_icon4.svgLicense: Public domainSee also http://en.wikiversity.org/wiki/Exploratory_factor_analysis/Data_analysis_tutorialSee also http://en.wikiversity.org/wiki/Exploratory_factor_analysis/Data_analysis_tutorial

Summary of EFA steps

Determine no. of factors

Theory?

Kaiser's criterion?

Eigen Values and Scree plot?

% variance explained?

Interpretability of weakest factor?

Image source: http://commons.wikimedia.org/wiki/File:Information_icon4.svgLicense: Public domainSee also http://en.wikiversity.org/wiki/Exploratory_factor_analysis/Data_analysis_tutorialSee also http://en.wikiversity.org/wiki/Exploratory_factor_analysis/Data_analysis_tutorial

Summary of EFA steps

Select items

Use factor loadings to help identify which items belong in which factor

Drop items one at a time if they don't belong to any factor e.g., consider any items for whichprimary (highest) loading is low? (< .5 ?)

cross- (other) loading(s) are high? (> .3 ?)

item wording doesn't match the meaning of the factor

Image source: http://commons.wikimedia.org/wiki/File:Information_icon4.svgLicense: Public domainSee also http://en.wikiversity.org/wiki/Exploratory_factor_analysis/Data_analysis_tutorialSee also http://en.wikiversity.org/wiki/Exploratory_factor_analysis/Data_analysis_tutorial

Summary of EFA steps

Name and describe factors

Examine correlations amongst factors

Analyse internal reliability

Compute composite scores

Check factor structure across
sub-groups

Covered in this lecture

Image source: http://commons.wikimedia.org/wiki/File:Information_icon4.svgLicense: Public domainSee also http://en.wikiversity.org/wiki/Exploratory_factor_analysis/Data_analysis_tutorialSee also http://en.wikiversity.org/wiki/Exploratory_factor_analysis/Data_analysis_tutorial

EFA example 4:
University student motivation

271 UC students responded to 24 university student motivation statements in 2008 using an 8-point Likert scale (False to True)

For example:
I study at university

to enhance my job prospects.

because other people have told me I should.

EFA PC Oblimin revealed 5 factors

Example EFA:
University student motivation

Image source: http://commons.wikimedia.org/wiki/File:Information_icon4.svgLicense: Public domainSee also http://en.wikiversity.org/wiki/Exploratory_factor_analysis/Data_analysis_tutorialSee also http://en.wikiversity.org/wiki/Exploratory_factor_analysis/Data_analysis_tutorial

Example EFA:
Pattern matrix

Primary loadings for each item are above .5

It shows a simple factor structure.

Cross-loadings are all below .3.

This is a pattern matrix showing factor loadings.

Image source: http://commons.wikimedia.org/wiki/File:Information_icon4.svgLicense: Public domainSee also http://en.wikiversity.org/wiki/Exploratory_factor_analysis/Data_analysis_tutorialSee also http://en.wikiversity.org/wiki/Exploratory_factor_analysis/Data_analysis_tutorial

Example EFA:
University student motivation

Career & Qualifications
(6 items; = .92)

Self Development
(5 items; = .81)

Social Opportunities
(3 items; = .90)

Altruism
(5 items; = .90)

Social Pressure
(5 items; = .94)

Image source: http://commons.wikimedia.org/wiki/File:Information_icon4.svgLicense: Public domainSee also http://en.wikiversity.org/wiki/Exploratory_factor_analysis/Data_analysis_tutorialSee also http://en.wikiversity.org/wiki/Exploratory_factor_analysis/Data_analysis_tutorial

Example EFA:
University student motivation factor means and confidence intervals (error-bar graph)

Image source: http://commons.wikimedia.org/wiki/File:Information_icon4.svgLicense: Public domainSee also http://en.wikiversity.org/wiki/Exploratory_factor_analysis/Data_analysis_tutorialSee also http://en.wikiversity.org/wiki/Exploratory_factor_analysis/Data_analysis_tutorial

Example EFA:
University student motivation factor correlations

MotivationSelf DevelopmentSocial EnjoymentAltruismSocial Pressure

Career & Qualifications.26.25.24.06

Self Development.33.55-.18

Social Enjoyment.26.33

Altruism.11

Image source: http://commons.wikimedia.org/wiki/File:Information_icon4.svgLicense: Public domainSee also http://en.wikiversity.org/wiki/Exploratory_factor_analysis/Data_analysis_tutorialSee also http://en.wikiversity.org/wiki/Exploratory_factor_analysis/Data_analysis_tutorial

Exploratory factor analysis: Q & A

Questions?

Image source:https://commons.wikimedia.org/wiki/File:Pleiades_large.jpg

Image source:https://commons.wikimedia.org/wiki/File:Pleiades_large.jpgImage authorr: NASA, ESA, AURA/Caltech, Palomar ObservatoryThe science team consists of: D. Soderblom and E. Nelan (STScI), F. Benedict and B. Arthur (U. Texas), and B. Jones (Lick Obs.)License: https://en.wikipedia.org/wiki/Public_domain

Psychometric Instrument Development

Image source: http://commons.wikimedia.org/wiki/File:Soft_ruler.jpg, CC-by-SA 3.0

Image source:http://commons.wikimedia.org/wiki/File:Soft_ruler.jpgImage author: Lite, http://commons.wikimedia.org/wiki/User:LiteImage license: GFDL 1.2+, CC-by-SA 3.0 unported

Bryman & Cramer (1997).
Concepts and their measurement. [eReserve]

DeCoster, J. (2000).
Scale construction notes. [Online]

Howitt & Cramer (2005).
Reliability and validity: Evaluating the value of tests and measures. [eReserve]

Howitt & Cramer (2014).
Ch 37: Reliability in scales and measurement: Consistency and measurement. [Textbook/eReserve]

Wikiversity.
Measurement error. [Online]

Wikiversity.

Reliability and validity. [Online]

Readings: Psychometrics

Concepts &
their measurement





Operationalising
fuzzy concepts

Image source: https://www.flickr.com/photos/lenore-m/441579944/in/set-72157600039854497

Image source: https://www.flickr.com/photos/lenore-m/441579944/in/set-72157600039854497Image author: Lenore EdmanImage license: CC by 2.0, https://creativecommons.org/licenses/by/2.0/

Concepts & their measurement:
Bryman & Cramer (1997)

Conceptsexpress common elements in the world (to which we give a name)

form a linchpin in the process of social research

Bryman & Cramer Concepts & their Measurement (1997):
"Concepts form a linchpin in the process of social research. Hypotheses contain concepts which are the products of our reflections on the world. Concepts express common elements in the world to which we give a name." (p. 53)

Concepts & their measurement:
Bryman & Cramer (1997)

Hypotheses specify expected relations between concepts

Bryman & Cramer Concepts & their Measurement (1997):Once formulated, a concept and the concepts with which it is purportedly associated, such as social class and authoritarianism, will need to be operationally defined, in order for systematic research to be conducted in relation to it..."

Concepts & their measurement:
Bryman & Cramer (1997)

OperationalisationA concept needs to be operationally defined in order for systematic research to be conducted in relation to it.

An operational definition specifies the procedures (operations) that will permit differences between individuals in respect of the concept(s) concerned to be precisely specified ..."

Bryman & Cramer Concepts & their Measurement (1997):Once formulated, a concept and the concepts with which it is purportedly associated, such as social class and authoritarianism, will need to be operationally defined, in order for systematic research to be conducted in relation to it..."

Concepts & their measurement:
Bryman & Cramer (1997)

... What we are in reality talking about here is measurement, that is, the assignment of numbers to the units of analysis - be they people, organizations, or nations - to which a concept refers."

Bryman & Cramer Concepts & their Measurement (1997, p. ?)

Image (left) source:http://www.flickr.com/photos/iliahi/408971482/By Maui in Vermont: http://www.flickr.com/photos/iliahiLicense: CC-by-A 2.0

Image (right) source: http://commons.wikimedia.org/wiki/File:3DTomo.jpgBy Tomograph: http://de.wikipedia.org/wiki/Benutzer:TomographLicense: GFDL

Operationalisation

The act of making a fuzzy concept measurable.

Social science often uses multi-item measures to assess related but distinct aspects of a fuzzy concept.

Image source: http://commons.wikimedia.org/wiki/Image:Die_bone.jpgCreative Commons Attribution 3.0 Unported

Operationalisation steps

Brainstorm indicators of a concept

Define the concept

Draft measurement items

Pre-test and pilot test

Examine psychometric properties
- how precise are the measures?

Redraft/refine and re-test

This is a cyclical or iterative process.

Operationalising a fuzzy concept: Example (Brainstorming indicators)

Image source: James Neill, Creative Commons Attribution-Share Alike 2.5 Australia, http://creativecommons.org/licenses/by-sa/2.5/au/

Fuzzy concepts - Mindmap

Nurse empowerment

Image source: James Neill, Creative Commons Attribution 3.0; Adapted from original image source: http://www.uwo.ca/fhs/nursing/laschinger/image002.gif (Retreived 20 March, 2007). As of 25 March, 2008, the original image could no longer be found. It was based on research by Laschinger and colleagues on nurse empowerment.

Factor analysis process

Image source: Figure 4.2 Bryman & Cramer (1997)

Image source: Figure 4.2 Bryman & Cramer (1997)

Measurement error

Image source: http://commons.wikimedia.org/wiki/File:Noise_effect.svg

Image source: http://commons.wikimedia.org/wiki/File:Noise_effect.svgImage license: GFDL 1.2+

Measurement error

Measurement error is statistical deviation from the true value caused by the measurement procedure.Observed score =

true score +/- measurement errorMeasurement error =

systematic error +/- random error Systematic error =

sampling error +/- non-sampling error

Systemic and random error

Measurement error

Systematic error e.g., bathroom scales aren't calibrated properly, and every measurement is 0.5kg too high. The error occurs for each measurement.

Random error e.g., measure your weight 3 times using the same scales but get three slightly different readings. The error is different for each measurement.

Image source: http://www.freestockphotos.biz/stockphoto/17079

Image source: http://www.freestockphotos.biz/stockphoto/17079Image author: National Cancer Institute, https://visualsonline.cancer.gov/Image license: Public domain

Sources of systematic error

Test reliability & validity
(e.g., unreliable
or invalid tests)Sampling
(non-representative sample)Researcher bias
(e.g., researcher favours a hypothesis)Paradigm
(e.g., focus on positivism)

Respondent bias
(e.g., social desirability)Non-sampling

Image source:s Unknown.Paradigm (e.g., assumptions, focus, collection method)

Personal researcher bias (conscious & unconscious)

Sampling (e.g., representativeness of sample)

Non-sampling (e.g, non-response, inaccurate response due to unreliable measurements, misunderstanding, social desirability, faking bad, researcher expectancy, administrative error)

Measurement precision & noise

The lower the precision, the more participants are needed to make up for the "noise" in the measurements.

Even with a larger sample, noisy data can be hard to interpret.

Especially when testing and assessing individual clients, special care is needed when interpreting results of noisy tests.

http://www.sportsci.org/resource/stats/precision.html

The lower the precision, the more subjects you'll need in your study to make up for the "noise" in your measurements. Even with a larger sample, noisy data can be hard to interpret. And if you are an applied scientist in the business of testing and assessing clients, you need special care when interpreting results of noisy tests.

Minimising measurement error

Standardise administration conditions with clear instructions and questions

Minimise potential demand characteristics (e.g., train interviewers)

Use multiple indicators for fuzzy constructs

Minimising measurement error

Obtain a representative sample:Use probability-sampling, if possible

For non-probability sampling, use strategies to minimise selection bias

Maximise response rate:Pre-survey contact

Minimise length / time / hassle

Rewards / incentives

Coloured paper

Call backs / reminders

Ensure administrative accuracy:Set up efficient coding, with well-labelled variables

Check data (double-check at least a portion of the data)

Minimising measurement error

Psychometrics

Image source: http://commons.wikimedia.org/wiki/File:Information_icon4.svg

Image source: http://commons.wikimedia.org/wiki/File:Information_icon4.svgLicense: Public domain

Image source: http://www.flickr.com/photos/ajawin/2614485186/By akawin http://www.flickr.com/photos/ajawin/License: Creative Commons by Attribution 2.0 http://creativecommons.org/licenses/by/2.0/deed.en

Psychometrics: Goal

To validly measure differences between individuals and groups in psychosocial qualities such as attitudes and personality.

Psychometrics is the field of study concerned with the theory and technique of educational and psychological measurement, which includes the measurement of knowledge, abilities, attitudes, and personality traits. The field is primarily concerned with the study of differences between individuals and between groups of individuals. http://en.wikipedia.org/wiki/Psychometrics

Psychometric tasks

Develop approaches and procedures (theory and practice) for measuring psychological phenomena

Design and test psychological measurement instrumentation
(e.g., examine and improve reliability and validity of psychological tests)

Psychometrics is the field of study concerned with the theory and technique of educational and psychological measurement, which includes the measurement of knowledge, abilities, attitudes, and personality traits. The field is primarily concerned with the study of differences between individuals and between groups of individuals. http://en.wikipedia.org/wiki/Psychometrics

Psychometrics: As test-taking grows, test-makers grow rarer

"Psychometrics, one of the most obscure, esoteric and cerebral professions in America, is now also one of the hottest.
- As test-taking grows, test-makers grow rarer, David M. Herszenhor, May 5, 2006, New York Times

Psychometricians are in demand due to increased testing of educational and psychological capacity and performance.

http://www.nytimes.com/2006/05/05/education/05testers.htmlIn Australia, probably the largest collection of psychometricians is at the Australian Council for Educational Research:http://www.acer.edu.au/

Psychometric methods

Factor analysis Exploratory

Confirmatory

Classical test theoryReliability

Validity

Also item response modeling

Reliability & Validity

Image source: http://www.flickr.com/photos/psd/17433783/in/photostream

Image source: http://commons.wikimedia.org/wiki/File:Information_icon4.svgLicense: Public domain

Image source: http://www.flickr.com/photos/psd/17433783/in/photostreamBy psd http://www.flickr.com/photos/psd/License: Creative Commons by Attribution 2.0 http://creativecommons.org/licenses/by/2.0/deed.en

Reliability and validity
(Howitt & Cramer, 2005)

Reliability and validity (classical test theory) are ways of evaluating the accuracy of psychological tests and measures.Reliability is about consistency of items within the measure

the measure over time

Validity is about whether the measure actually measures what it is intended to measure.

Internal consistency (single administration) and Test-reliability (multiple administrations) are based on classical test theory. A more advanced/refined reliability technique is based Item Response Theory, which involves examining the extent to which each item discriminates between individuals with high/low overall scores.For more info see: http://en.wikipedia.org/wiki/Reliability_(psychometric)

Reliability vs. validity

In classical test theory, reliability is generally thought to be necessary for validity, but it does not guarantee validity.

In practice, a test of a relatively changeable psychological construct such as suicide ideation, may be valid (i.e., accurate), but not particularly reliable over time (because suicide ideation is likely to fluctuate).

Image source: http://www.socialresearchmethods.net/kb/relandval.phpReliability is about the consistency of a measure. Validity is about whether a test actually measures what its meant to measure.Validity requires reliability, but reliability alone is not sufficient for validity.See the more complex discussion of this at:http://en.wikiversity.org/wiki/Reliability_and_validity

Reliability vs. validity

Reliability A car which starts every time is reliable.

A car which does not start every time is unreliable.

Validity A car which always reaches the desired destination is valid.

A car which misses the desired destination is not valid.

Image source: https://commons.wikimedia.org/wiki/File:Aiga_carrental_cropped.svg

Image source: https://commons.wikimedia.org/wiki/File:Aiga_carrental_cropped.svgImage author: Paucobot, https://commons.wikimedia.org/wiki/User:PaucabotImage license: CC-by-3.0 unported, https://creativecommons.org/licenses/by/3.0/deed.en

Reliability and validity
(Howitt & Cramer, 2005)

Reliability and validity are not inherent characteristics of measures. They are affected by the context and purpose of the measurement a measure that is valid for one purpose may not be valid for another purpose.

Internal consistency (single administration) and Test-reliability (multiple administrations) are based on classical test theory. A more advanced/refined reliability technique is based Item Response Theory, which involves examining the extent to which each item discriminates between individuals with high/low overall scores.For more info see: http://en.wikipedia.org/wiki/Reliability_(psychometric)

Reliability

Reproducibility of a measurement

Image source: UnknownReliable; not validReliable; valid

Types of reliability

Internal consistency: Correlation among multiple items in a factorCronbach's Alpha ()

Test-retest reliability: Correlation between test at one time and anotherProduct-moment correlation (r)

Inter-rater reliability: Correlation between one observer and another:Kappa

Internal consistency (single administration) and Test-reliability (multiple administrations) are based on classical test theory. A more advanced/refined reliability technique is based Item Response Theory, which involves examining the extent to which each item discriminates between individuals with high/low overall scores.For more info see: http://en.wikipedia.org/wiki/Reliability_(psychometric)

Reliability rule of thumb

< .6 = Unreliable.6 = OK.7 = Good.8 = Very good, strong.9 = Excellent> .95 = may be overly reliable or redundant this is subjective and depends on the nature what is being measured

Reliability rule of thumb

Table 7 Fabrigar et al (1999).

Table 7 Fabrigar et al. (1999)

Rule of thumb - reliability coefficients should be over .70, up to approx. .95

Image source: Table 7 Fabrigar et al (1999).

Internal consistency
(or internal reliability)

Internal consistency refers to:How well multiple items combine as a measure of a single concept

The extent to which responses to multiple items are consistent with one another

Internal consistency can measured by:Split-half reliability

Odd-even reliability

Cronbach's Alpha ()

Internal consistency
(Recoding)

If dealing with a mixture of positively and negatively scored items, remember to ensure that negatively-worded items are recoded.

Types of internal consistency:
Split-half reliability

Sum the scores for the first half (e.g., 1, 2, 3) of the items.

Sum the scores for the second half (e.g., 4, 5, 6) of the items.

Compute a correlation between the sums of the two halves.

ReferenceHowitt & Cramer (2005), p. 221

Types of internal consistency:
Odd-even reliability

Sum the scores for items
(e.g.,1, 3, 5)

Sum the scores for items
(e.g., 2, 4, 6)

Compute a correlation between the sums of the two halves.

ReferenceHowitt & Cramer (2005), p. 221

Types of internal reliability:
Alpha (Cronbach's )

Averages all possible split-half reliability coefficients.

Akin to a single score which represents the degree of intercorrelation amongst the items.

Most commonly used indicator of internal reliability

ReferenceHowitt & Cramer (2005), p. 221

More items greater reliability
(The more items, the more rounded the measure)

Minimum items to create a factor is 2.

There is no maximum, however the law of diminishing returns is that each additional item will add less and less to the reliability.

Typically ~ 4 to 12 items per factor are used.

Final decision is subjective and depends on research context

How many items per factor?

Internal reliability example:
Student-rated
quality of maths teaching

10-item scale measuring students assessment of the educational quality of their maths classes

4-point Likert scale ranging from:
strongly disagree to strongly agree

This example is from Francis 6.1

Quality of mathematics teaching

My maths teacher is friendly and cares about me.

The work we do in our maths class is well organised.

My maths teacher expects high standards of work from everyone.

My maths teacher helps me to learn.

I enjoy the work I do in maths classes.

+ 5 more

This example is from Francis 6.1

Internal reliability example: Quality of maths teaching

This example is from Francis 6.1

Image source: James Neill, Creative Commons Attribution-Share Alike 2.5 Australia, http://creativecommons.org/licenses/by-sa/2.5/au/SPSS Analyze Scale Reliability Analysis

SPSS:
Corrected Item-total correlation

Item-total correlations
should be > ~.5

This example is from Francis 6.1Image source: James Neill, Creative Commons Attribution-Share Alike 2.5 Australia, http://creativecommons.org/licenses/by-sa/2.5/au/SPSS Analyze Scale Reliability Analysis

SPSS: Cronbachs

If Cronbach's if item deleted is higher than the , consider removing item.

This example is from Francis 6.1Image source: James Neill, Creative Commons Attribution-Share Alike 2.5 Australia, http://creativecommons.org/licenses/by-sa/2.5/au/SPSS Analyze Scale Reliability Analysis

Item-total Statistics

Scale Scale Corrected Mean Variance Item- Alpha if Item if Item Total if Item Deleted Deleted Correlation Deleted

MATHS1 25.2749 25.5752 .6614 .8629MATHS2 25.0333 26.5322 .6235 .8661MATHS3 25.0192 30.5174 .0996 .9021MATHS4 24.9786 25.8671 .7255 .8589MATHS5 25.4664 25.6455 .6707 .8622MATHS6 25.0813 24.9830 .7114 .8587MATHS7 25.0909 26.4215 .6208 .8662MATHS8 25.8699 25.7345 .6513 .8637MATHS9 25.0340 26.1201 .6762 .8623MATHS10 25.4642 25.7578 .6495 .8638

Reliability Coefficients

N of Cases = 1353.0 N of Items = 10

Alpha = .8790

SPSS: Reliability output

Remove this item.

Maths3 does not correlate well with the other items and the Cronbach's alpha would increase without this item.

This example is from Francis 6.1

Item-total Statistics

Scale Scale Corrected Mean Variance Item- Alpha if Item if Item Total if Item Deleted Deleted Correlation Deleted

MATHS1 22.2694 24.0699 .6821 .8907MATHS2 22.0280 25.2710 .6078 .8961MATHS4 21.9727 24.4372 .7365 .8871MATHS5 22.4605 24.2235 .6801 .8909MATHS6 22.0753 23.5423 .7255 .8873MATHS7 22.0849 25.0777 .6166 .8955MATHS8 22.8642 24.3449 .6562 .8927MATHS9 22.0280 24.5812 .7015 .8895MATHS10 22.4590 24.3859 .6524 .8930

Reliability Coefficients

N of Cases = 1355.0 N of Items = 9

Alpha = .9024

SPSS: Reliability output

Alpha improves.

This example is from Francis 6.1

Reliabilities: Example table

This example is from Francis 6.1Image source: James Neill, Creative Commons Attribution-Share Alike 2.5 Australia, http://creativecommons.org/licenses/by-sa/2.5/au/Life Effectiveness Questionnaire - http://wilderdom.com/leq.html

For the lab report, students won't have test-retest reliability, but they can include a column that indicates the number of items.

Validity

Validity is the extent to which an instrument actually measures what it purports to measure.

Validity = does the test measure what its meant to measure?

Image source: UnknownReliable; validhttp://en.wikipedia.org/wiki/Construct_Validity

Validity

Validity is multifaceted and includes: Comparing wording of the items with theory and expert opinion

Examining correlations with similar and dissimilar measures

Testing how well the measure predicts the future

From http://www.socialresearchmethods.net/kb/measval.htm

Types of validity

Face validity

Content validity

Criterion validity Concurrent validity

Predictive validity

Construct validity Convergent validity

Discriminant validity

From http://www.socialresearchmethods.net/kb/measval.htm

Face validity
(low-level of importance overall)

Asks:
"At face-value, do the questions appear to measure what the test purports to measure?"

Important for:
Respondent buy-in

How assessed:
Read the test items

Adapted from: http://www.psyasia.com/supportsuite/index.php?_m=knowledgebase&_a=viewarticle&kbarticleid=85&nav=0Prima facie extent to which an item is judged to reflect target construct.e.g., a test of ability to add two-digit numbers should cover the full range of combinations of digits. A test with only one-digit numbers, or only even numbers, would (on the face of it) not have good coverage of the content domain.

Content validity
(next level of importance)

Asks:
"Are questions measuring the complete construct?"

Important for:
Ensuring holistic assessment

How assessed:
Diverse means of item generation (lit. review, theory, interviews, expert review)

Adapted from: http://www.psyasia.com/supportsuite/index.php?_m=knowledgebase&_a=viewarticle&kbarticleid=85&nav=0Extent to which test content covers a representative sample of the domain to be measured. Compare test with: existing literature

expert panels

qualitative interviews / focus groups with target sample

Criterion validity
(high importance)

Asks:
"Can a test score predict real world outcomes?"

Important for:
Test relevance and usefulness

How assessed:
Concurrent validity: Correlate test scores with recognised external criteria such as performance appraisal scores

Predictive validity: Correlate test scores with future outcome e.g., offender risk rating with recidivism

Adapted from: http://www.psyasia.com/supportsuite/index.php?_m=knowledgebase&_a=viewarticle&kbarticleid=85&nav=0Involves the correlation between the test and a criterion variable (or variables) taken as representative of the construct.

Concurrent validityCorrelation between the measure and other recognised measures of the target construct

Predictive validityExtent to which a measure predicts something that it theoretically should be able to predict.

Criterion refers to the true score. Criterion validity refer to measures of relationship between the criterion variable and the measured variable.http://en.wikipedia.org/wiki/Concurrent_validityhttp://www.socialresearchmethods.net/kb/measval.htm

Construct validity
(high importance)

Asks:
Does the test assess the construct it purports to?
("the truth, the whole truth and nothing but the truth.")

Important for:
Making inferences from operationalisations to theoretical constructs

How assessed:

- Theoretical (is the theory about the construct valid?) - Statistical Convergent correlation with similar measures Discriminant not correlated with other constructs

Adapted from: http://www.psyasia.com/supportsuite/index.php?_m=knowledgebase&_a=viewarticle&kbarticleid=85&nav=0Totality of evidence about whether a particular operationalisation of a construct adequately represents what is intended.

Not distinct from support for the substantive theory of the construct.

Involves empirical and theoretical support.

Includes:Statistical analyses of the internal structure of the test including the relationships between responses to different test items (internal consistency).

Relationships between the test and measures of other constructs.

Convergent validityExtent to which a measure correlates with measures with which it theoretically should be associated.

Discriminant validityExtent to which a measure does not correlate with measures with which it theoretically should not be associated.

Construct validity
(high importance)

Image source:http://www.socialresearchmethods.net/kb/considea.phpImage license: 2006, William M.K. Trochim, All Rights Reserved

Composite Scores

Image source: https://commons.wikimedia.org/wiki/File:PEO-hamburger.svg

Image source: https://commons.wikimedia.org/wiki/File:PEO-hamburger.svgImage author: PhantomOpenEmoji, https://github.com/break24/PhantomOpenEmoji/graphs/contributorsImage license: CC by 3.0 unported, https://creativecommons.org/licenses/by/3.0/deed.en

Composite scores

Combine item-scores into an overall factor score which represents individual differences in each of the target constructs.The new composite score can then be used for:Descriptive statistics and histograms

Correlations

As IVs and/or DVs in inferential analyses such as MLR and ANOVA

Composite scores

There are two ways of creating composite scores:Unit weighting

Regression weighting

Unit weighting

Average (or total) of item scores within a factor.
(each variable is equally weighted)

X = mean(y1yp)

Unit Weighting

.25.25.25.25

Creating composite scores: Dealing with missing data

To maximise the sample size, consider computing composite scores in a way that allows for some missing data.

SPSS syntax: Compute X = mean (v1, v2, v3, v4, v5, v6) Compute X = mean.4 (v1, v2, v3, v4, v5, v6)Specifies a min. # of items. If the min. isn't available, the composite score will be missing.In this example, X will be computed for a case when the case has responses to at least 4 of the 6 items.How many items can be missed? Depends on overall reliability. A rule of thumb:Allow 1 missing per 4 to 5 items

Allow 2 missing per 6 to 8 items

Allow 3+ missing per 9+ items

Composite scores:
Missing data

A researcher may decide to be more or less conservative depending on the factors reliability, sample size, and the nature of the study.

compute X = mean (v1, v2, v3, v4, v5.v6)
(note: a missing value will be returned for a case if 1 or more of its items are missing data)compute X = mean.4 (v1, v2, v3, v4, v5,v6)
(note: in this case a missing value will be returned for a case if 3 or more of its items are missing data the .4 means calculate a mean using data from 4 or more items, otherwise make X missing)

Regression weighting

Factor score regression weighting
The contribution of each
item to the composite score
is weighted to reflect
responses to some items
more than other items.X = .20*a + .19*b + .27*c + .34*d

X

.20.19.27.34a

b

c

d

This is arguably more valid, but the advantage may be marginal, and it makes factor scores between
studies more difficult to compare.

X = r1y1 +. + rpypRegression weighting will also include small weights for non-target items

Regression weighting

Two calculation methods:Manual (use Compute New variable name = MEAN.*(list of variable names separated by commas)

Automatic (use Factor Analysis Factor Scores Save as variables - Regression)

Image source: James Neill, Creative Commons Attribution-Share Alike 2.5 Australia, http://creativecommons.org/licenses/by-sa/2.5/au/

Regression weighting SPSS data

Data view: Data are standardised, centred around 0

Variable view: Variables auto-calculated through SPSS factor analysisImage sources: James Neill, Creative Commons Attribution-Share Alike 2.5 Australia, http://creativecommons.org/licenses/by-sa/2.5/au/

Writing up
instrument development

Image source: http://commons.wikimedia.org/wiki/File:Information_icon4.svgLicense: Public domain

Image source: http://www.flickr.com/photos/mezone/21970578/By http://www.flickr.com/photos/mezone/License: Creative Commons by Attribution 2.0 http://creativecommons.org/licenses/by/2.0/deed.en

Writing up instrument development

IntroductionReview previous literature about the construct's underlying factors consider both theory and research

Generate a research question e.g., What are the underlying factors of X? or make a hypothesis about the number of factors and what they will represent.

Writing up instrument development

MethodMaterials/Instrumentation summarise the design and development of the measures and the expected factor structure
e.g., present a table of the expected factors and their operational definitions.

Writing up instrument development

ResultsFactor analysisAssumption testing

Extraction method & rotation

# of factors, with names and definitions

# of items removed and rationale

Item factor loadings & communalities

Factor correlations

Reliability for each factor

Composite scores for each factor

Correlations between factors

Writing up instrument development

DiscussionTheoretical underpinning Was it supported by the data? What adaptations should be made to the theory?

Quality / usefulness of measure Provide an objective, critical assessment, reflecting the measures' strengths and weaknesses

Recommendations for further improvement

Writing up a factor analysisDownload examples: http://goo.gl/fD2qby

Summary

Summary:
What is psychometrics?

Science of psychological measurement

Goal: Validly measure individual psychosocial differences

Design and test psychological measures e.g., usingFactor analysis

Reliability and validity

Summary:
Concepts & their measurement

Concepts name common elements

Hypotheses identify relations between concepts

Brainstorm indicators of a concept

Define the concept

Draft measurement items

Pre-test and pilot test

Examine psychometric properties

Redraft/refine and re-test

Summary: Measurement error

Deviation of measure from true score

Sources:Non-sampling (e.g., paradigm, respondent bias, researcher bias)

Sampling (e.g., non-representativeness)

How to minimise:Well-designed measures

Representative sampling

Reduce demand effects

Maximise response rate

Ensure administrative accuracy

Summary: Reliability

Consistency or reproducibility

TypesInternal consistency

Test-retest reliability

Rule of thumb> .6 OK

> .8 Very good

Internal consistencySplit-half

Odd-even

Cronbach's Alpha

Summary: Validity

Extent to which a measure measures what it is intended to measure

MultifacetedCompare with theory and expert opinion

Correlations with similar and dissimilar measures

Predicts future

Summary: Composite scores

Ways of creating composite (factor) scores: Unit weightingTotal of items or

Average of items
(recommended for lab report)

Regression weighting Each item is weighted by its importance to measuring the underlying factor (based on regression weights)

Introduction Review constructs & previous structures

Generate research question or hypothesis

Method Explain measures and their development

Results Factor analysis

Reliability of factors

Descriptive statistics for composite scores

Correlations between factors

Discussion Theory? / Measure? / Recommendations?

Summary: Writing up
instrument development

Allen, P. & Bennett, K. (2008). Reliability analysis (Ch 15) in SPSS for the health & behavioural sciences (pp. 205-218). South Melbourne, Victoria, Australia: Thomson.

Bryman, A. & Cramer, D. (1997). Concepts and their measurement (Ch. 4). In Quantitative data analysis with SPSS for Windows: A guide for social scientists (pp. 53-68). Routledge.

DeCoster, J. (2000). Scale construction notes. http://www.stat-help.com/scale.pdf (pdf)

Fabrigar, L. R., Wegener, D. T., MacCallum, R. C., & Strahan, E. J. (1999). Evaluating the use of exploratory factor analysis in psychological research. Psychological Methods, 4(3), 272-299.

Fowler, F. (2002). Designing questions to be good measures. In Survey research methods (3rd ed.)(pp. 76-103). Thousand Oaks, CA: Sage. Ereserve.

Howitt, D. & Cramer, D. (2005). Reliability and validity: Evaluating the value of tests and measures (Ch. 13). In Introduction to research methods in psychology (pp. 218-231). Harlow, Essex: Pearson. eReserve.

References

Open Office Impress

This presentation was made using Open Office Impress.

Free and open source software.

http://www.openoffice.org/product/impress.html

Click to edit the title text format

Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline LevelSeventh Outline LevelEighth Outline LevelNinth Outline Level

Click to edit the title text format

Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline LevelSeventh Outline LevelEighth Outline LevelNinth Outline Level

Click to edit the title text format

Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline LevelSeventh Outline LevelEighth Outline LevelNinth Outline Level

Click to edit the title text format

Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline LevelSeventh Outline LevelEighth Outline LevelNinth Outline Level

Click to edit the title text format