psychological scales - the university of tennessee at ... · web viewdo we have enough...

38
Psychology 5130 Lecture 3 Summated Scale Construction Psychological Scales Scale: An instrument, usually a questionnaire, designed to measure a person's position on some dimension. Examples: Intelligence scale; conscientiousness scale; self-esteem scale, etc. Can be responded to by oneself or by others – yielding what is called self-report or other-report. Can assess ability, or attitudes, or personality Just a very few of the dimensions which are important are The Big 5 dimensions of Extraversion, Agreeableness, Conscientiousness, Stability, and Openness, Narcissism, Cognitive ability, Job Satisfaction, Organizational commitment, Positive Affectivity, Negative affectivity, Intent to leave, Job Embededness, Depression, Integrity, Self-esteem, . . . . . . . . . . . Do we have enough psychological scales? Here’s what one author said more than 25 years ago . . . Focus here will be on Summated scales to measure personality related constructs. Historical Perspective Three types Guttman Thurstone Likert (pronounced Likkert) P513 Lecture 3: Scale Construction - 1 3/6/2022

Upload: buikiet

Post on 07-May-2018

215 views

Category:

Documents


2 download

TRANSCRIPT

Psychology 5130 Lecture 3Summated Scale Construction

Psychological Scales

Scale: An instrument, usually a questionnaire, designed to measure a person's position on some dimension. Examples: Intelligence scale; conscientiousness scale; self-esteem scale, etc.

Can be responded to by oneself or by others – yielding what is called self-report or other-report.

Can assess ability, or attitudes, or personality

Just a very few of the dimensions which are important are

The Big 5 dimensions of Extraversion, Agreeableness, Conscientiousness, Stability, and Openness, Narcissism, Cognitive ability, Job Satisfaction, Organizational commitment, Positive Affectivity, Negative affectivity, Intent to leave, Job Embededness, Depression, Integrity, Self-esteem, . . . . . . . . . . .

Do we have enough psychological scales?

Here’s what one author said more than 25 years ago . . .

Focus here will be on Summated scales to measure personality related constructs.

Historical Perspective

Three types

GuttmanThurstoneLikert (pronounced Likkert)A Likert scale (/ˈlɪk.ərt/ LIK -ərt [1] but more commonly pronounced /ˈlaɪ.kərt/ LY -kərt )

Now, most scales are Likert.

P513 Lecture 3: Scale Construction - 1 5/8/2023

Likert Scale / Summated Scale

A set of statements regarding a construct, presented to respondents with instructions to indicate the extent of agreement with each statement on a response scale consisting of from 2 alternatives to 11 alternatives.

Each response is assigned a numeric value.

Respondent's position on the dimension being measured, that is, the respondents, score, is the sum or mean of the values of responses to the statements related to that dimension.

An example questionnaire from which 5 scale scores are extracted follows on the next two pages.

It’s the Sample 50-item Big Five questionnaire taken from the web site of the International Personality Item Pool (IPIP) (http://ipip.ori.org/ipip/).

The 5 constructs measured by Big Five questionnaires are often called domains.

The items on the web site have been modified so that each is a complete sentence.For example, item 1 on the web site is “Am the life of the party.” Here it is “I am the life of the party.”

Even numbered items have been shaded. I have no evidence that such shading is beneficial.

The IPIP web site recommends a 5-point response scale. I prefer a 7-point response scale.

If you need a 50-item Big Five questionnaire, you may copy and use what follows.

Items are: E: 1,6,11,16,21,26,31,36,41,46

A: 2,7,12,17,22,27,32,37,42,47

C: 3,8,13,18,23,28,33,38,43,48

S: 4,9,14,19,24,29,34,39,44,49

O: 5,10,15,20,25,30,35,40,45,50

Note the periodicity in the placement of items – every 5th item is from the same domain.

Such periodicity is common in multiple domain scales.

P513 Lecture 3: Scale Construction - 2 5/8/2023

Questionnaire ID__________________________

Circle the number that represents how accurately the statement describes you.

7 = Completely Accurate6 = Very Accurate5 = Probably Accurate4 = Sometimes Accurate, Sometimes Inaccurate3 = Probably Inaccurate2 = Very Inaccurate1 = Completely Inaccurate

1. I am the life of the party. 1 2 3 4 5 6 7

2. I feel little concern for others. 1 2 3 4 5 6 7

3. I am always prepared. 1 2 3 4 5 6 7

4. I get stressed out easily. 1 2 3 4 5 6 7

5. I have a rich vocabulary. 1 2 3 4 5 6 7

6. I don't talk a lot. 1 2 3 4 5 6 7

7. I am interested in people. 1 2 3 4 5 6 7

8. I leave my belongings around. 1 2 3 4 5 6 7

9. I am relaxed most of the time. 1 2 3 4 5 6 7

10. I have difficulty understanding abstract ideas. 1 2 3 4 5 6 7

11. I feel comfortable around people. 1 2 3 4 5 6 7

12. I insult people. 1 2 3 4 5 6 7

13. I pay attention to details. 1 2 3 4 5 6 7

14. I worry about things. 1 2 3 4 5 6 7

15. I have a vivid imagination. 1 2 3 4 5 6 7

16. I keep in the background. 1 2 3 4 5 6 7

17. I sympathize with others' feelings. 1 2 3 4 5 6 7

18. I make a mess of things. 1 2 3 4 5 6 7

19. I seldom feel blue. 1 2 3 4 5 6 7

20. I am not interested in abstract ideas. 1 2 3 4 5 6 7

21. I start conversations. 1 2 3 4 5 6 7

22. I am not interested in other people's problems. 1 2 3 4 5 6 7

23. I get chores done right away. 1 2 3 4 5 6 7

24. I am easily disturbed. 1 2 3 4 5 6 7

25. I have excellent ideas. 1 2 3 4 5 6 7

P513 Lecture 3: Scale Construction - 3 5/8/2023

Circle the number that represents how accurately the statement describes you.

7 = Completely Accurate6 = Very Accurate5 = Probably Accurate4 = Sometimes Accurate, Sometime Inaccurate3 = Probably Inaccurate2 = Very Inaccurate1 = Completely Inaccurate

26. I have little to say. 1 2 3 4 5 6 7

27. I have a soft heart. 1 2 3 4 5 6 7

28. I often forget to put things back in their proper place. 1 2 3 4 5 6 7

29. I get upset easily. 1 2 3 4 5 6 7

30. I do not have a good imagination. 1 2 3 4 5 6 7

31. I talk to a lot of different people at parties. 1 2 3 4 5 6 7

32. I am not really interested in others. 1 2 3 4 5 6 7

33. I like order. 1 2 3 4 5 6 7

34. I change my mood a lot. 1 2 3 4 5 6 7

35. I am quick to understand things. 1 2 3 4 5 6 7

36. I don’t like to draw attention to myself. 1 2 3 4 5 6 7

37. I take time out for others. 1 2 3 4 5 6 7

38. I shirk my duties. 1 2 3 4 5 6 7

39. I have frequent mood swings. 1 2 3 4 5 6 7

40. I use difficult words. 1 2 3 4 5 6 7

41. I don’t mind being the center of attention. 1 2 3 4 5 6 7

42. I feel others’ emotions. 1 2 3 4 5 6 7

43. I follow a schedule. 1 2 3 4 5 6 7

44. I get irritated easily. 1 2 3 4 5 6 7

45. I spend time reflecting on things. 1 2 3 4 5 6 7

46. I am quiet around strangers. 1 2 3 4 5 6 7

47. I make people feel at ease. 1 2 3 4 5 6 7

48. I am exacting in my work. 1 2 3 4 5 6 7

49. I often feel blue. 1 2 3 4 5 6 7

50. I am full of ideas. 1 2 3 4 5 6 7

P513 Lecture 3: Scale Construction - 4 5/8/2023

Example - Overall Job Satisfaction Scale from Michelle Hinton Watson's Thesis

The items in this scale are presented as questions. In other instances, they are presented as statements. If presented as statements, the responses would represent amount of agreement.

If you need an overall job satisfaction scale, you may use this.

For each statement please put a check ( ) in the space showing how you feel about the following aspects of your job. This time, indicate how satisfied you are with the following things about your job.

(1) (2) (3) (4) (5) (6) (7)Very Moderately Slightly Neither Slightly Moderately Very

Dissatisfied Dissatisfied Dissatisfied Satisfied nor Satisfied Satisfied SatisfiedDissatisfied

VD MD SD N SS MS VS 1 2 3 4 5 6 7

Overall Satisfaction

27. How satisfied do you feel ( ) ( ) ( ) ( ) ( ) ( ) ( )with your chances for getting ahead in this organization?

32. All in all, how satisfied ( ) ( ) ( ) ( ) ( ) ( ) ( )are you with the persons in your work group?

35. All in all, how satisfied ( ) ( ) ( ) ( ) ( ) ( ) ( )are you with your supervisor?

37. All in all, how satisfied ( ) ( ) ( ) ( ) ( ) ( ) ( )are you with this organization, compared to most others?

43. How satisfied do you feel ( ) ( ) ( ) ( ) ( ) ( ) ( )with the progress you have made in this organization up to now?

45. Considering your skills and ( ) ( ) ( ) ( ) ( ) ( ) ( )the effort you put into the work, how satisfied are you with your pay?

50. All in all, how satisfied ( ) ( ) ( ) ( ) ( ) ( ) ( )

are you with your job?

P513 Lecture 3: Scale Construction - 5 5/8/2023

Why do we have multi-item scales.

1. Precision, a first reason for using multiple-item scales.

Consider a single item in a scale, “I am satisfied with my job.” Now consider the true positions of several respondents, represented by the positions of the top arrows:

VD MD SD N SS MS VS 1 2 3 4 5 6 7

27. All in all, how satisfied ( ) ( ) ( ) ( ) ( ) ( ) ( )are you with your job?

The response labels put there by the test constructor represent points on a continuum. They're like 1-foot marks on a scale of height.

So, in the above situation each respondent except the rightmost one would respond, MS, which would be assigned the value 6. But the rightmost respondent, whose actual position on the dimension is close to his/her nearest neighbor, would pick VS, creating a considerable “response” distance from that neighbor.

Since each respondent can pick only one of the response CATEGORIES, any response made may miss the respondent’s true amount of satisfaction by about 7 percent on a 7-point scale, by about 10 percent on a 5-point scale.

Note the wide range of actual feelings which would be represented by a 6 above.

Consider that two persons very close in their actual feelings about the job could get scores which were 7% apart. E.g., a person whose actual feeling is 6.55 would check 7. But a person whose actual feeling was 6.45 would check 6. The difference of 1 would be much greater than the actual difference of .1 in actual feeling. See red arrows above.

6 7

This situation is analogous to one that most students have strong feelings about – the use of 5 grades to represent performance in a course. We all remember those instances in which we missed the next higher grade mark by a 10th of a point. The use of a single item with just a few response categories is analogous.

Solution: Use multiple items. While each one may miss its mark considerably, some of the misses will be positive and some will be negative, cancelling out the positives, so the average of the responses will be very close to the respondent’s true position on the continuum.

Conclusion: Having multiple items and averaging responses to the multiple items increases accuracy of identification of the respondent’s true position on a dimension.

P513 Lecture 3: Scale Construction - 6 5/8/2023

True Positions

6.556.45

2. Reliability, a second reason for using multiple-item scales.

Since a single categorical item response involves only a gross approximation to the actual (True) feeling, on repeated measurement, a person giving only one response might get a very different score (6 vs. 7, for example) on a single item. This reduces reliability. Reducing reliability reduces estimated validity. Reducing estimated validity reduces your chances of getting published.

Conclusion: Summing or averaging responses to multiple items results in a measure that is inherently more reliable.

3. Ability to use internal consistency to assess Reliability.

It is possible to assess the reliability of multiple items scales in a single administration of the scale to a group by computing coefficient alpha. That is not possible with a single-item scale.

Conclusion: Using multiple items and basing the scale score on the sum or mean greatly facilitates our ability to estimate the reliability of the scale score.

4. Insulation from the effects of idiosyncratic items.

Sometimes, a respondent will have a unique reaction to the wording of a single item. This reaction may be based on the respondent’s history or understanding of that item. If that item is the only item in the scale, then the respondent’s position on the dimension will be greatly distorted by that reaction.

Conclusion: Including multiple items and using the sum or mean of responses to them diminishes the influence of any one idiosyncratic item.

Come on!!! What’s not to like about using multiple items???

1. Test length.

2. Overestimation of reliability by alpha from using too-similar items.

P513 Lecture 3: Scale Construction - 7 5/8/2023

Issues in development of Likert/Summated scales1. Do you have to ask for agreement?The original idea was to assess agreement. But now, other ordered characteristics are used. E.g., level of satisfaction, strength of feeling, accuracy with which a statement describes you, etc.2. How many response categories should be employed?2-11. Seven or more is preferable. Spector on p. 21 recommends 5-9. There are 3 reasons to use 7 or more.a. If your study will involve level shifts between conditions, you should allow plenty of room to shift, which means using 7- or 9-point scales.b. If you plan to use confirmatory factor analysis of structural equation models on your data, seven or more response options per item is preferred for reasons associated with factor analysis.c. We’ve obtained results suggesting that inconsistency of responding is relatively independent of response level when 7 point scales are used.Below are correlations of inconsistency (vertical) vs. level (horizontal) for 5-point scales on the left and 7-point on the right. X-axis is individual person scores. Y-axis is standard deviations of items making up the score.Nhung Honest 5 point response scale; r=-.35

Nhung faking 5 point response scale; r=-.43

Vikus 5 point response scale; r = -.49

Bias Study IPIP – 5 point response scale; r = -.32

Bias Study NEO-FFI 5 point response scale; r = -.34

P513 Lecture 3: Scale Construction - 8 5/8/2023

Incentive Honest – 7 point response scale; r = -.03

Incentive Faking 7 point response scale; r = +.08

FOR Study Gen 7 point response scale; r = -.04

FOR Study FOR 7 point response scale; r = +.08

Worthy Thesis 7 point response scale; r=-.18

3. Should there be a neutral category?

I am not familiar with a clear-cut, strong argument either way. I prefer one.If you analyze the data using Confirmatory Factor Analysis or Structural Equation Modeling, it doesn’t matter.

My guess (and it’s just a guess) is that you’ll get a few more failures to respond without one, from people who just can’t make up their minds.And variability of responses might be slightly smaller with one, from those same people responding in the middle.But, I’m not aware of a meta-analysis on this issue.

4. What numeric values should be assigned to each response possibility for analyses based on sums or means?

Although at one time there were arguments for scaling the various response alternatives, now almost everyone who analyzes the data traditionally uses successive equally spaced integers. They need not be, but everyone uses successive, as opposed to every other, for example, integers.

For example

Strongly StronglyDisagree Disagree Neutral Agree Agree

1 2 3 4 5Or

Strongly Moderately Moderately StronglyDisagree Disagree Disagree Neutral Agree Agree Agree

1 2 3 4 5 6 7

Newer Confirmatory Factor Analysis and Structural Equation Modeling based analyses assuming the data are “Ordered Categorical” require simply that the responses categories be ordered. No numeric assignment is required.

5. If the analyses are based on sums or means, which integers should be used?

Answer: Any set will do. They should be successive integers.1 to 5 or 1 to 70 to 4 or 0 to 6-2 to +2 or -3 to +3.

6. Does agreement have to be high or low numbers?

Yes, the God of statistics will strike you down if you make small numbers indicate more of ANY construct. Being a golfer will not save you.

I strongly prefer assigning numbers so that a bigger response value represents more of the construct as it is named. I’m sure it’s what the God of Statistics intended.

P513 Lecture 3: Scale Construction - 9 5/8/2023

7. What about including negatively worded items, perhaps better labeled as “opposite idea” items.

Negatively worded items may be included, although there is no guarantee that responses to negatively worded items will be the actual negation of what the responses to a positively worded counterpart would have been.

I like my supervisor vs. I dislike my supervisor.

Responses to these two items should be perfectly negatively correlated, but often they are not.

Many studies have found that items with negative wording are responded to similarly to other negatively worded items, regardless of content or dimension, presumably due to the negativity of the item, regardless of the main content of the items. We have found this in seven datasets.

We’ve also found that items with positive wording are responded to similarly regardless of content just because of the fact that they’re positively-worded.

Recommendation:

Best: Design and, using factor analysis, analyze your questionnaire so that it permits estimation of the bias tendencies. Estimate a general factor, a positively-worded item factor, and a negatively-worded item factor. Treat these three factors as separate indicators of the construct. Nobody does this now, because the discovery of such wording-related response tendencies is still being investigated.

Expedient: Have an equal number of positively-worded and negatively-worded items and average across wordings to cancel out differences in response tendencies associated with wording.

P513 Lecture 3: Scale Construction - 10 5/8/2023

8. If negatively worded items are included, how should they be scored?

Typically, negatively worded items are reverse-scored and then they’re treated as if they had been positively worded.

Example for items with 5 categories and values 1,2,3,4, and 5.

Original Reversed1 52 43 34 25 1

Suppose Q1 = I like my jobSuppose Q7 = I don’t like to come to work in the morning> A negatively-worded item for job satisfaction.

Data matrix:

Person Q1 Q7 Q7R Scale as sum Scale as mean1 5 2 4 9=5+4 4.52 4 1 5 9=4+5 4.53 1 5 1 2=1+1 14 2 4 2 4=2+2 2

9. Should the scale score be the sum or the mean of items?

If there are no missing values, the sum and the mean will be perfectly correlated – they’re mathematically equivalent, so you can use either.

The mean is more easily related to the questionnaire items if they all have the same response format.

SPSS’s RELIABILITY procedure computes only the sum.

If there are missing values, use the mean of available items or use imputation techniques to be described next to impute missing values, after which it won’t matter whether you use the mean or sum.

P513 Lecture 3: Scale Construction - 11 5/8/2023

10. What about missing responses?

There are several possibilities

a. Listwise deletion. Generally not preferred but if only a couple out of 200 are missing, use it.

Q1 Q2 Q3 Substituted: Q1 Q2 Q31 5 4 5 1 5 4 52 2 _ 3 3 3 2 43 3 2 4 4 1 2 14 1 2 1

Problem: Can decimate the dataset. You may be left with highly conscientiousness, agreeable participants because that kind of participant is the only that that will respond to all the items.

b. Item mean substitution. Substitute mean of other persons' responses to the same item for missing item. Item mean substitution. Not recommended.

Q1 Q2 Q3 Substituted: Q1 Q2 Q3 Q1 Q2 Q35 4 5 5 4 5 5 4 52 _ 3 2 2.7 3 2 3 33 2 4 3 2 4 3 2 41 2 1 1 2 1 1 2 1

|

Mean of 4, 2, & 2 substituted.

c. Person scale mean substitution. Substitute mean of other items from same scale that person responded to for the missing item. Assume Q1, Q2, and Q3 are three items forming a scale. Not recommended.Q1 Q2 Q3 Substituted: Q1 Q2 Q35 4 5 5 4 52 _ 3 2 2.5 3 Mean of 2 & 3 substituted.3 2 4 3 2 41 2 1 1 2 1

d. Use a more sophisticated imputation technique. Several are available in SPSS.

I have often used SPSS’s imputation techniques.---------------------------------

e. The convention wisdom is changing on issues of missing values. Many modern statistical techniques are designed to work with all available data. These techniques do not include REGRESSION and GLM.

11. Writing the items. Spector, p. 23...

a. Each item should involve only one idea.The death penalty should be abolished because it’s against religious law.

b. Avoid colloquialisms, jargon.I am the life of the party. I shirk my duties.

c. Consider the reading level of the respondent.

d. Avoid using “not” to create negatively worded items.

Good: Communication in my organization is poor.Bad: Communication is my organization is not good.

P513 Lecture 3: Scale Construction - 12 5/8/2023

or

e. Avoid items that might trigger emotional responses in certain samples.

The effect of self-presentation tendencies

Suppose two independent constructs are being measured using summated rating scales. Suppose each construct was measured with a two-item scale using a 6-valued response format consisting of the values 1 through 6.

Suppose 16 persons participated, giving the following matrix of responses.

Q1 Q2 Q3 Q41 2 2 2 22 2 2 3 33 2 2 4 44 2 2 5 55 3 3 2 26 3 3 3 37 3 3 4 48 3 3 5 59 4 4 2 210 4 4 3 311 4 4 4 412 4 4 5 513 5 5 2 214 5 5 3 315 5 5 4 416 5 5 5 5

For these hypothetical data, Q1 and Q2 are perfectly correlated, as are Q3 and Q4. Obviously, items within the same scale are not perfectly correlated in real life.

But Q1+Q2 are uncorrelated with Q3+Q4. The constructs are independent.

compute C1=mean(Q1,Q2).compute C2=mean(Q3,Q4).correlate c1 with c2.

True Correlation between the constructs, C1 and C2.Correlations

.0001.000

16

Pearson CorrelationSig. (2-tailed)N

C1C2

P513 Lecture 3: Scale Construction - 13 5/8/2023

Syntax to create construct scale scores

Construct 1

Construct 2

Now, suppose that the odd-numbered participants were people who preferred the low end of whatever response scale they were filling out, while the even numbered participants were people who preferred the high end of whatever scale they were filling out. Obviously, our participants don’t separate into odd-even groups like this, but they do separate. There ARE people who prefer the high end of the response scale and there ARE people who prefer the low end.

For example, many personality items have valence – agreeing indicates you’re “good”; disagreeing indicates you’re “not good”. We think that those who are feeling good about themselves will be tend to choose the agreement end of the response scale and those who are not feeling so good about themselves will tend to choose the disagreement end. We believe that some people respond to unobtrusive content – for example, the valence – of the items.

Assume that the response tendency results in a bias of 1 response value down in the case of those who tend to disagree or up in the case of those who tend to agree.

Person BiasedQ1

BiasedQ2

BiasedQ3

BiasedQ4

1 1 1 1 12 3 3 4 43 1 1 3 34 3 3 6 65 2 2 1 16 4 4 4 47 2 2 3 38 4 4 6 69 3 3 1 110 5 5 4 411 3 3 3 312 5 5 6 613 4 4 1 114 6 6 4 415 4 4 3 316 6 6 6 6

Now the correlation between Q1+Q2 with Q3+Q4 is .555, a value that is statistically significant.

The point of this is that differences in participants' response tendencies (e.g., the tendency of some to use only the upper part of a response scale while others use the lower part of the scale) can result in positive correlations between constructs that are in fact, uncorrelated.

This is a problem that has been referred to as the method bias problem. The term refers to the fact that correlations between constructs obtained using the same method are biased upwarly. It plagues the use of summated rating scales. Many journals will not accept research in which the independent and the dependent variables are measured using the same method.

P513 Lecture 3: Scale Construction - 14 5/8/2023

True ValuesObserved Values

Construct 2Construct 1

TrueQ1

TrueQ2

True Q3

True Q4

1 2 2 2 22 2 2 3 33 2 2 4 44 2 2 5 55 3 3 2 26 3 3 3 37 3 3 4 48 3 3 5 59 4 4 2 210 4 4 3 311 4 4 4 412 4 4 5 513 5 5 2 214 5 5 3 315 5 5 4 416 5 5 5 5

Construct 2Construct 1

The process of creating a summated scale.

1. Define/conceptualize the construct to be measured.

2. See if someone else has already created a scale measuring that construct. If so, and if it appears OK, don’t re-invent the wheel. Faculty. Buros Institute. IPIP web site. Google.

http://buros.org/mental-measurements-yearbookRemember . . .

3. If you must develop your own, begin by generating items.

4. Have a sample of SMEs rate the extent to which each item represents the construct. Keep only the best.

5. Administer items to a pilot sample from the population of interest.

6. Perform item analysis of the responses of the pilot sample.

a. Assess reliability.b. Identify bad items, those that reduce reliability, and eliminate them.c. Assess dimensionality using exploratory factor analysis.

All items in the same scale should represent the same dimension and no other dimension.

7. Perform a validation study assessing convergent and discriminant validity using the population of interest, perhaps using the pilot sample.

a. Administer other similar scales.b. Administer other discriminating scales.

Kayitesi Wilt’s thesis – a validation study of the Cultural Intelligence Scale (CQS).She compared the mean CQS scores of persons who’ve been abroad and enjoyed that travel vs those who’ve been abroad and not enjoyed it. That will be convergent validity.She administered the CQS, an Emotional Intelligence Scale, a Social Intelligence Scale, and a Big Five questionnaire. She assessed discriminant validity of the CQS with respect to the other scales. It should not be highly correlated with any of them. That’s discriminant validity

8. Administer to a sample from the population of interest along with the other scales that are part of your research project.

Assess the theoretical relationships of interest to you.

9. Publish the scale and get rich.

P513 Lecture 3: Scale Construction - 15 5/8/2023

Example of processing items of a scale in SPSS

This example is taken from an independent study project conducted by Lyndsay Wrensen examining factors related to faking of the Big 5 personality inventory. She administered the IPIP Big 5 inventory twice – once under instructions to respond honestly and again (counterbalanced) under instructions to respond as if seeking a customer service job.

The data here are the honest condition responses to the Extroversion scale. Participants read each item and indicate how accurately it described them using 1=Very inaccurate to 5=Very accurate. Some of the items were negatively worded. We now would use a 7-point response scale. This project was done almost 10 years ago.

Extroversion item responses before reverse-scoring the negatively worded items.

of

P513 Lecture 3: Scale Construction - 16 5/8/2023

1. Reverse-score the negatively-worded items.

SPSS Syntax to reverse-score negatively worded items.

recode he2 he4 he6 he8 he10 (1=5)(2=4)(3=3)(4=2)(5=1) into he2r he4r he6r he8r he10r.execute.

Or you can do the reverse scoring manually or using pull-down menus.

However, you do it, put the reverse-scored values in columns that are different from the originals.

Extroversion item responses after reverse-scoring the negatively worded items.

P513 Lecture 3: Scale Construction - 17 5/8/2023

2. Deal with missing data. (Not illustrated here.)

For example, use SPSS’s imputation features. Set up a time with me, and I’ll walk you through the process.

3. Perform reliability analyses.

Analyze -> Scale -> Reliabilities

ReliabilityWar ni ngs

The covar iance m at r ix is calculat ed and used in t he analysis.

Ca s e Proc e s s ing Sum m a ry

1 7 9 9 0 .4

1 9 9 .6

1 9 8 1 0 0 .0

Va l i d

Ex c l u d e da

To ta l

Ca s e sN %

L i s twis e d e l e ti o n b a s e d o n a l l v a ria b l e s i n th e p ro c e d u re .a .

P513 Lecture 3: Scale Construction - 18 5/8/2023

Note that the reverse-scored items are the ones that are included in the RELIABILITY analysis.

Reliability S tatistics

.859 .860 10Cronbach's A lpha

Cronbach'sA lpha B ased on

S tandardizedItems N of Items

Item Statistics

3.13 1.122 179

3.97 .908 179

3.72 1.093 179

3.34 1.277 179

3.41 1.216 179

3.56 1.254 179

3.27 1.136 179

3.79 1.110 179

2.74 1.224 179

2.70 1.285 179

he1

he3

he5

he7

he9

he2r

he4r

he6r

he8r

he10r

Mean Std. Dev iation N

In te r-Ite m Co rre la tio n Ma trix

1 .0 0 0 .3 3 4 .2 4 5 .5 1 8 .5 4 2 .3 6 7 .4 3 5 .2 11 .3 3 6 .2 9 6

.3 3 4 1 .0 0 0 .4 7 3 .3 7 2 .3 3 1 .3 5 4 .3 6 7 .3 6 8 .1 7 0 .3 8 3

.2 4 5 .4 7 3 1 .0 0 0 .5 5 3 .3 2 9 .4 2 1 .4 0 7 .4 8 8 .2 4 2 .5 1 9

.5 1 8 .3 7 2 .5 5 3 1 .0 0 0 .4 2 7 .3 9 1 .4 0 4 .4 4 6 .1 9 8 .3 7 5

.5 4 2 .3 3 1 .3 2 9 .4 2 7 1 .0 0 0 .3 8 2 .4 9 6 .2 5 8 .4 6 1 .2 9 5

.3 6 7 .3 5 4 .4 2 1 .3 9 1 .3 8 2 1 .0 0 0 .5 5 0 .5 7 2 .3 3 1 .4 4 8

.4 3 5 .3 6 7 .4 0 7 .4 0 4 .4 9 6 .5 5 0 1 .0 0 0 .3 7 5 .3 7 1 .4 5 0

.2 11 .3 6 8 .4 8 8 .4 4 6 .2 5 8 .5 7 2 .3 7 5 1 .0 0 0 .2 4 1 .4 1 7

.3 3 6 .1 7 0 .2 4 2 .1 9 8 .4 6 1 .3 3 1 .3 7 1 .2 4 1 1 .0 0 0 .2 1 0

.2 9 6 .3 8 3 .5 1 9 .3 7 5 .2 9 5 .4 4 8 .4 5 0 .4 1 7 .2 1 0 1 .0 0 0

h e 1

h e 3

h e 5

h e 7

h e 9

h e 2 r

h e 4 r

h e 6 r

h e 8 r

h e 1 0 r

h e 1 h e 3 h e 5 h e 7 h e 9 h e 2 r h e 4 r h e 6 r h e 8 r h e 1 0 r

T h e c o v a ria n c e ma trix is c a lc u la te d a n d u s e d in th e a n a ly s is .

Summary Item Statis tics

3.3 63 2.6 98 3.9 72 1.2 74 1.4 72 .18 0 10

1.3 63 .82 5 1.6 50 .82 5 2.0 00 .06 5 10

.38 1 .17 0 .57 2 .40 2 3.3 63 .01 0 10

Item Me an s

Item Va rian c es

Inte r-Item Co rre la tions

Me an Min imu m Ma x imum Ra ng eMa x imum /Min imu m Va rian c e N o f Items

Th e c ov arianc e matrix is c a lc u late d a nd u s e d in the an aly s is .

I t em- Tot al St at i st i cs

30. 50 50. 162 . 547 . 456 . 848

29. 66 52. 496 . 516 . 312 . 851

29. 92 49. 504 . 612 . 504 . 842

30. 29 47. 724 . 610 . 501 . 842

30. 22 48. 691 . 586 . 449 . 844

30. 07 47. 501 . 638 . 489 . 839

30. 36 48. 546 . 649 . 455 . 839

29. 84 50. 080 . 560 . 443 . 847

30. 89 51. 309 . 417 . 273 . 859

30. 93 48. 501 . 557 . 375 . 847

he1

he3

he5

he7

he9

he2r

he4r

he6r

he8r

he10r

Scale M ean ifI t em Delet ed

Scale Var ianceif I t em Delet ed

Cor r ect edI t em - Tot alCor r elat ion

Squar ed M ult ipleCor r elat ion

Cr onbach's Alphaif I t em Delet ed

P513 Lecture 3: Scale Construction - 19 5/8/2023

New Directions in Measurement of Psychological Constructs

1) Measurement Using Factor Scores from factor analyses

Factor scores are measures obtained from factor analyses of items.

Factors scores are computed by differentially weighting each item according to its contribution to the indication of the dimension. Items which are not highly correlated with the dimension are given little weight. Those which are highly correlated with the dimension are given more weight.

Note that summated scale scores are computed by equally weighting each item that is thought to be relevant. So a summated scores is a crude factor score.

The loadings of the items on the factor are used to determine the weights.

Advantages of factor scores

They probably better capture the dimension of interest. They’re probably more highly correlated with the dimension than the simple sum of items.

They can be computed taking into account other factors that might influence the items, thus may be uncontaminated by the other factors.

Disadvantages of factor scores

They are harder to compute, requiring a factor analysis program.

The weights will differ from sample to sample so your weighting scheme based on your sample will differ from my weighting scheme based on my sample.

2. Using techniques based on Item Response Theory.

Item Response Theory (IRT) is a statistical theory of how people respond to items and how to score the items. It is kind of like factor analysis, but the underlying theory is different from that used in factor analysis.

IRT methods are used by most large-scale test publishers, such as ETS, ACT, Pearson. IRT methods routinely incorporate ideas that are not usually considered by persons using summated scales.

If you’re serious about measurement, you’ll have to learn a lot about both factor analytic methods and IRT methods.

P513 Lecture 3: Scale Construction - 20 5/8/2023

3) Using the whole buffalo: Measuring other aspects of responses to questionnaires, such as inconsistency.

Virtually all scales are scores to represent the level of responses to items representing a dimension.

So, a Conscientiousness score is the average level, the mean of the responses of a person to the Conscientiousness items in a questionnaire.

What about the variability of a person’s responses to the C items?

We’ve been exploring the relationships of Inconsistency of responding, as measured by the standard deviation of persons’ responses to items from the same dimension.

Recent data (Reddock, Biderman, & Nguyen, 2011).

Overall UGPA was the criterion. Conscientiousness and Variability were predictors.

Results with Conscientiousness scale scores from the FOR condition

Both standardized coefficients are significant at p < .01.

These data suggest that Inconsistency of Responding may be a valid predictor of certain types of performance.

References

Reddock, C. M., Biderman, M. D., & Nguyen, N. T. (2011). The relationship of reliability and validity of personality tests to frame-of-reference instructions and within-person inconsistency. International Journal of Selection and Assessment, 19, 119-131.

For example: If you give a Big 5 questionnaire to a group of respondents, you can measure the following 11 attributesExtraversion Agreeableness Conscientiousness Stability OpennessGeneral Affect Positive Wording Bias Negative Wording BiasInconsistency Extreme Response Tendency Acquiescent Response Tendency

P513 Lecture 3: Scale Construction - 21 5/8/2023

R = .315-.219

.250

Inconsistency

FORCon

UGPA