1 chapter 5 test scores as composites this chapter is about the quality of items in a test

71
1 CHAPTER 5 CHAPTER 5 Test Scores as Test Scores as Composites Composites This Chapter is about the This Chapter is about the Quality of Items in a Test. Quality of Items in a Test.

Upload: lewis-may

Post on 26-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

  • Slide 1
  • 1 CHAPTER 5 Test Scores as Composites This Chapter is about the Quality of Items in a Test.
  • Slide 2
  • 2 Test Scores as Composites What is the Composite Test Score? A composite test score is a total test score created by summing two or more subtest scores i.e., WAIS IV Full Scale IQ consisted of 1-Verbal Comprehension 2-Perceptual Reasoning Index, 3-Working Memory Index, and 4-Processing Speed Index. Qualifying Examinations and EPPP Exams are also composite test scores. A composite test score is a total test score created by summing two or more subtest scores i.e., WAIS IV Full Scale IQ consisted of 1-Verbal Comprehension Index, 2-Perceptual Reasoning Index, 3-Working Memory Index, and 4-Processing Speed Index. Qualifying Examinations and EPPP Exams are also composite test scores.
  • Slide 3
  • 3 Item Scoring Schemes Systems We have 2 different scoring system Item Scoring Schemes [skeems]Systems We have 2 different scoring system 1. Dichotomous Scores Dichotomous Scores are restricted to 0 and 1 such as scores on True and False, and multiple-choice question 2. Non-dichotomous Scores Non dichotomous Scores are not restricted to 0 and 1 Can have range of possible points such as in essays. 1,2, 3, 4, 5..
  • Slide 4
  • 4 Dichotomous Scheme Examples 1. The space between nerve cell endings is called the a. Dendrite a. Dendrite b. Axon ; b. Axon ; c. Synapse c. Synapse d. Neutron d. Neutron (In this item, responses a, b, and d are scored 0; response c is scored 1.) 2. Teachers in public school systems should have the right to strike. a. Agree a. Agree b. Disagree b. Disagree (In this item, a response of Agree is scored 1; Disagree is scored 0). Or, you can use True or False.
  • Slide 5
  • 5 Practical Implication for Test Construction Variance and Covariance measure the quality of items in a test. Reliability and validity measure the quality of the entire test. =SS/N used by one set of data Variance is the degree of variability of scores from mean.
  • Slide 6
  • 6 Practical Implication for Test Construction Correlation is based on a statistic called Covariance (Cov xy or S xy) COVxy=SP/N-1 used for 2 sets of data Covariance is a number that reflects the degree to which 2 variables vary together. r=sp/ssx.ssy
  • Slide 7
  • 7Variance X = ss/N Pop 1 s = ss/n-1 or ss/df Sample 1 s = ss/n-1 or ss/df Sample 2 4 5 SS= x -(x)/N SS= ( x- ) Sum of Squared Deviation from Mean
  • Slide 8
  • 8Covariance Covariance is a number that reflects the degree to which 2 variables vary together. Original Data X Y X Y 1 3 1 3 2 6 2 6 4 4 4 4 5 7 5 7
  • Slide 9
  • 9CovarianceCOVxy=SP/N-1 2 ways to calculate the SP SP= xy-(x.y/N) SP= (x-x)(y-y) SP requires 2 sets of data SS requires only one set of data
  • Slide 10
  • 10 Descriptive Statistics for Dichotomous Data
  • Slide 11
  • 11 Descriptive Statistics for Dichotomous Data Item Variance & Covariance
  • Slide 12
  • 12 Descriptive Statistics for Dichotomous Data P=Item Difficulties: P= (#of examinees who answered an item correctly / total # of examinees or P=f/N See handout The higher the P value The easier the item
  • Slide 13
  • Relationship between Item Difficulty P and Variance Variance (quality) (quality) 0 difficult 0.5 1 easy P= Item Difficulty 13
  • Slide 14
  • 14 Non-dichotomous Scores Examples 1. Write a grammatically correct German sentence using the first person singular form of the verb verstehen. (A maximum of 3 points may be awarded and partial credit may be given.) 2. An intellectually disabled person is a nonproductive member of society. 5. Strongly agree 4. Agree, 3. No opinion 5. Strongly agree 4. Agree, 3. No opinion 2. Disagree 1. Strongly disagree 2. Disagree 1. Strongly disagree (Scores can range from 1 to 5 points. with high scores indicating a positive attitude toward intellectually disabled citizens.) (Scores can range from 1 to 5 points. with high scores indicating a positive attitude toward intellectually disabled citizens.)
  • Slide 15
  • 15 Descriptive Statistics for Non-dichotomous Variables
  • Slide 16
  • 16 Descriptive Statistics for Non-dichotomous Variables
  • Slide 17
  • 17 Variance of a Composite C =SS/N a =SS a /N a b =SS b /N b C = a+b Ex. From WAIS III-- FSIQ=VIQ+PIQ If More than 2 subtests, C = a + b + c Calculate the variance for each subtest and add them up.
  • Slide 18
  • 18 Variance of a Composite C What is the Composite Test Score? Ex. WAIS IV Full Scale IQ which consist of a-Verbal Comprehension Index, b-Perceptual Reasoning Index, c-Working Memory Index, and d-Processing Speed Index. More than 2 subtests C = a + b + c + d
  • Slide 19
  • 19 *Suggestions to Increase the Total Score Variance of a Test 1-Increase the number of items in a test 2-Item difficulties p (medium range) 3-Items with similar content have higher correlations & higher covariance 4-Item scores & total scores variances alone are not indices (in-d- cz) of test quality (reliability and validity).
  • Slide 20
  • 20 *1-Increase the Number of Items in a Test (how to calculate the test variance) Variance for a test of 25 items is higher than a variance for a test of 20 items. =N( x )+N(N-1)(COV x )= Ex. If the COVx=items covariance = (0.10) x =items variance (0.20) x =items variance (0.20) N= #of items in a test -- first try N=20 =test variance For 20 items 42, then try N=25 then try N=25 and =test variance for 25 items 65 and =test variance for 25 items 65
  • Slide 21
  • 21 2-Item Difficulties Item difficulties should be almost equal for all of the items and difficulty levels should be in the medium range.
  • Slide 22
  • 22 3-Items with Similar Content have Higher Correlations & Higher Covariance
  • Slide 23
  • 23 4- Item Scores & Total Scores Variances Alone are not Indices (in-d- cz) of Test Quality Variance and Covariance are important and necessary however, they are not sufficient to determine the test quality. Variance and Covariance are important and necessary however, they are not sufficient to determine the test quality. To determine a higher level of test quality we use Reliability and Validity. To determine a higher level of test quality we use Reliability and Validity.
  • Slide 24
  • UNIT II RELIABILITY CHAP 6: RELIABILITY AND THE CLASSICAL TRUE SCORE MODEL CHAP 7: PROCEDURES FOR ESTIMATING RELIABILITY CHAP 8: INTRODUCTION TO GENERALIZABILITY THEORY CHAP 9: RELIABILITY COEFFICIENTS FOR CRITERION-REFERENCED TESTS 24
  • Slide 25
  • 25 CHAPTER 6 Reliability and the Classical True Score Model Reliability (p)=Reliability is a measure of consistency/dependability, or when a test measures same thing more than once and results in same outcome. Reliability refers to the consistency of examinees performance over repeated administrations of the same test or parallel forms of the test (Linda Crocker Text).
  • Slide 26
  • THE MODERN THE MODERN MODELS MODELS 26
  • Slide 27
  • *TYPES OF RELIABILITY *TYPES OF RELIABILITY TYPE OF RELIABILITY WHT IT IS HOW DO YOU DO IT WHAT THE RELIABILITY COEFFICIENT LOOKS LIKE Test-Retest 2 Admin stability A measure of stability same test/measure same group Administer the same test/measure at two different times to the same group of participants r test1.test2 Ex. IQ test Parallel/alternate Interitem/Equivalent Forms 2 Admin equivalence A measure of equivalence twodifferent forms same test to the same group Administer two different forms of the same test to the same group of participants r testA.testB Ex. Stats Test r testA.testB Test-Retest with Alternate Forms 2 Admin stability equivalence A measure of stability and equivalence On Monday, you administer form A to 1st half of the group and form B to the second half. On Friday, you administer form B to 1st half of the group and form A to the 2nd half Inter-Rater 1 Admin agreement A measure of agreement Have two raters rate behaviors and then determine the amount of agreement between them Percentage of agreement Internal Consistency 1 Admin consistently each item measures the same underlying construct A measure of how consistently each item measures the same underlying construct Correlate performance on each item with overall performance across participants Cronbachs Alpha Method Kuder-Richardson Method Split Half Method Hoyts Method 27
  • Slide 28
  • Test-Retest Class IQ Scores Students X 1 st time on Mon Y 2 nd time on Fri John 125 120 Jo 110 112 Mary 130 128 Kathy 122 120 David 115 120 28
  • Slide 29
  • Parallel/alternate Forms Scores on 2 forms of stats tests Students Form A Form B John 95 92 Jo 84 82 Mary 90 88 Kathy 76 80 David 81 78 29
  • Slide 30
  • Test-Retest with Alternate Forms On Monday, you administer form A to 1st half of the group and form B to the second half. On Friday, you administer form B to 1st half of the group and form A to the 2nd half Students Form A to 1st group (Mon) Students Form B to 2nd group (Mon) David 85 Mark 82 Mary 94 Jane 95 Jo 78 George 80 John 81 Mona 80 Kathy 67 Maria 70 Next slide 30
  • Slide 31
  • Test-Retest with Alternate Forms On Friday, you administer form B to 1st half of the group and form A to the second Students Form B to 1 st group (Fri) Students Form A to 2nd group (FRi) David 85 Mark 82 Mary 94 Jane 95 Jo 78 George 80 John 81 Mona 80 Kathy 67 Maria 70 31
  • Slide 32
  • 32 HOW RELIABILITY IS MEASURED Reliability is Measured by Using a Correlation Coefficient Correlation Coefficient r test1test2 or r x.y r test1test2 or r x.y Reliability Coefficients: Indicates how scores on one test change, relative to scores on a second test Indicates how scores on one test change, relative to scores on a second test Can range from 0.0 to 1 Can range from 0.0 to 1 1.00 = perfect reliability1.00 = perfect reliability 0.00 = no reliability0.00 = no reliability
  • Slide 33
  • THE CLASSICAL MODEL MODEL 33
  • Slide 34
  • 34 A CONCEPTUAL DEFINITION OF RELIABILITY CLASSICAL MODEL Method Error Method Error Observed Score = True Score Error Score Trait Error Trait Error X=T E
  • Slide 35
  • Classical Test Theory The Observed Score, X=T+E X is the score you actually record or observe on a test. The True Score, T=X-E or, the difference between the Observed score and Error score is the True score T score is the reflection of the examinee true knowledge The Error Score, E =X-T or, the difference between the Observed score and True score is the Error score. E are factors that cause the True Score and observed score to differ. 35
  • Slide 36
  • 36 A CONCEPTUAL DEFINITION OF RELIABILITY (X) Observed Score X=T E Score that actually observed Score that actually observed Consists of two components Consists of two components True ScoreTrue Score Error ScoreError Score Method Error Method Error Observed Score = True Score Error Score Trait Error Trait Error
  • Slide 37
  • 37 A CONCEPTUAL DEFINITION OF RELIABILITY True Score T=X-E Perfect reflection of true value for individual Perfect reflection of true value for individual Theoretical score Theoretical score Method Error Method Error Observed Score = True Score Error Score Trait Error Trait Error
  • Slide 38
  • 38 Method error is due to characteristics of the test or testing situation Trait error is due to individual characteristics Conceptually, Reliability = True Score Observed Score Reliability of the observed score becomes higher if error is reduced!! Method Error Method Error Observed Score = True Score Error Score Trait Error Trait Error True Score True Score + Error Score A CONCEPTUAL DEFINITION OF RELIABILITY
  • Slide 39
  • 39 Error Score E=X-T Is the Difference between Observed and True score Is the Difference between Observed and True score X=TE X=TE 95=90+5 or 85=90-5 The difference between T and X is 5 points or E=5 95=90+5 or 85=90-5 The difference between T and X is 5 points or E=5 Method Error Method Error Observed Score = True Score Error Score Trait Error Trait Error A CONCEPTUAL DEFINITION OF RELIABILITY OR
  • Slide 40
  • 40 The Classical True Score Model X=TE X= Represents the observed test score T= Represents the individual's True knowledge of score E= Represents the random error component
  • Slide 41
  • 41 Classical Test Theory What Makes up the Error Score? E=X-T Error Score consist of; 1-Method Error and 2-Trait Error 1-Method Error Method Error is the difference between True & Observed Scores resulting from the test or testing situation. Method Error is the difference between True & Observed Scores resulting from the test or testing situation. 2-Trait Error Trait Error is the difference between True & Observed Scores resulting from the characteristics of examinees. See next slide
  • Slide 42
  • 42 What Makes up the Error Score?
  • Slide 43
  • 43 Expected Value of True Score Definition of the True Score The True score is defined as the expected value of the examinees test scores (mean of observed scores) over many repeated testing with the same test.
  • Slide 44
  • 44 Error Score Definition of the Error Score Error scores for an examinee over many repeated testing should be Zero. eEj=Tj-Tj=0 eEj=Expected value of Error Tj=Examinee True Score Ex. next Ex. next
  • Slide 45
  • 45 Error Score X-E=T or, the difference between the Observed score and Error score is the True score (scores are from the same examinee) 98-8= 90 98-8= 90 88+2=90 88+2=90 80+10=90 80+10=90 100-10=90 XE=T 100-10=90 XE=T 95-5=90 95-5=90 81+9=90 81+9=90 88+2=90 88+2=90 90-0=90 90-0=90-8+2+10-10-5+9+2-0=0
  • Slide 46
  • 46 *INCREASING THE RELIABILITY OF A TEST Meaning Decreasing Error 7 Steps 1. Increase Sample Size (n) 2. Eliminate Unclear Questions 3. Standardize Testing Conditions 4. Moderate the Degree of Difficulty of the tests (P) the tests (P) 5. Minimize the Effects of External Events 6. Standardize Instructions (Directions) 7. Maintain Consistent Scoring Procedures (use rubric)
  • Slide 47
  • 47 *Increasing Reliability of your Items in a Test
  • Slide 48
  • 48 *Increasing Reliability Cont..
  • Slide 49
  • 49 How Reliability (p) is Measured for an Item/score P=True Score/True Score + Error Score or p=T/T+E 0=== p === 1 Note: In this formula you always add your Error(the difference between T and X) to the True Score in the denominator (), Whether is positive or negative. p=T/T + (the difference between T and X which is E) p=T/T+E
  • Slide 50
  • Which Item has the Highest Reliability? Maximum points for this question is 10 p=T/T+E +2= 8.. 8/10=0.80 -3=6. 6/9=0.666 +7=1.1/8=0.125 -1=9..9/10=0.90 +4=6....6/10=0.60 -4=6.....6/10=0.60 +1=7....7/8=0.875 0=1010/10=1.0 -5=4..4/9=0.444 +6=3..3/9=0.333 >MORE ERROR MORE ERROR