gold-measurement-properties-preliminary
DESCRIPTION
Sample Richard G. Lambert and Do-Hong Kim June 2010 1 Validity Rasch Analysis Factor Analysis The Measurement Properties of the Teaching Strategies GOLD ™ Assessment System 2 1 are warranted for this item. The full range of ratings, 0–9, was used by the teachers when rating this sample for all but two of the items. Only the range 0–7 was used for items 19a and 19b, which are more difficult items related to writing. The Measurement Properties of the Teaching Strategies GOLD ™ Assessment System 3TRANSCRIPT
1
The Measurement Properties of the Teaching Strategies GOLD™ Assessment System:
Preliminary Results Following the Spring Assessment Checkpoint
Richard G. Lambert and Do-Hong Kim
Center for Educational Measurement and Evaluation The University of North Carolina at Charlotte
June 2010
SampleThe total sample for the third phase of the Teaching Strategies GOLD™ assessment system field test included 2,465 children. The children in this nationally representative sample received preschool services in 40 different centers that are located across the United States. Most of these centers use The Creative Curriculum® for Preschool and had been assessing children by using The Creative Curriculum® Developmental Continuum for Ages 3–5 before this study. A total of 165 different raters (teachers) provided the ratings for the study. Each teacher received training in the use of the Teaching Strategies GOLD™ assessment system and rated an average of 15 children.
The children in the study ranged in age from 4–72 months. The percentages of children in the sample in each 6-month age-group are reported in Table 1. The sample was split almost exactly evenly between boys (50.1 percent) and girls (49.9 percent). English was the primary language spoken in the homes of 45.8 percent of the children. Spanish was the primary language spoken in the homes of 47.4 percent of the children. The remaining 6.8 percent of the children lived in homes where the primary language spoken was one of 29 other languages. Of all the children in the sample, 7.8 percent have disabilities. Children with an Individualized Family Service Plan (IFSP) comprised 1.4 percent of the sample, and children with an Individualized Education Program (IEP) comprised 6.4 percent of the sample.
2 The Measurement Properties of the Teaching Strategies GOLD™ Assessment System
The children spanned the entire age range for which the assessment system is intended (birth through kindergarten). It is important to note that the field test analyses discussed in this report were conducted with unweighted data. Children who are English-language learners were oversampled intentionally.
Validity
Factor Analysis
The first step of the validity analysis was exploratory factor analysis using Principal Axis Factoring and direct oblimin rotations. A five-factor solution accounted for 83.64 percent of the variance in item responses. The results of this solution are reported in Table 2. Simple structure was clearly achieved in this solution; no item had loadings of greater than .40 on more than one factor. In addition, four of the five factors exactly matched the intended domain of development according to the way the items were organized theoretically by the tool developers. Each of the items related to the Cognitive, Social–Emotional, Physical, and Language domains of development loaded on a factor with all of the other items within their respective domains. The only exceptions were the literacy and mathematics items. These items loaded together on one factor rather than two.
Rasch Analysis
Data were also analyzed by using the Rasch Rating Scale Model (Andrich, 1978) and Winsteps software (Linacre, 2009). A separate Rasch analysis was conducted for each of the five domains of development identified in the factor analysis. Results of the Rasch principal components analysis of residuals (PCAR) showed that the variance in the data explained by the Rasch measures for each of the five scale scores ranged from 83.2–88.7 percent, and the largest secondary dimension accounted for only 2.5–5.2 percent of the unexplained variance. These results, which are reported in Table 3 by scale score, clearly satisfy the Rasch model assumption of unidimensionality. For PCAR, a variance of greater than 50 percent explained by measures is considered good support for scale unidimensionality.
Overall, the rating scale functioned effectively for each of the scale scores. Specifically, the average measure score increased with the category level. The thresholds advanced with the categories, indicating that category 0 is most likely to be observed for children who have relatively lower ratings on the items, whereas category 9 is most likely to be observed for children who have relatively higher ratings. The only exception to this finding was item 6 on the Physical scale, for which disordinality was found between ratings of 0 and 1. This item focuses on gross-motor development. Future research with the measure may focus on whether further refinement of either the behavioral anchors or training for ratings of 0 and
The Measurement Properties of the Teaching Strategies GOLD™ Assessment System 3The Measurement Properties of the Teaching Strategies GOLD™ Assessment System
1 are warranted for this item. The full range of ratings, 0–9, was used by the teachers when rating this sample for all but two of the items. Only the range 0–7 was used for items 19a and 19b, which are more difficult items related to writing.
With very few exceptions, the fit statistics for all of the items were well within acceptable limits. Mean-square fit values between 0.6 and 1.4 are considered reasonable for rating scale items (Bond & Fox, 2007). For the Social–Emotional items, Infit mean-square values ranged from .77–1.13, and Outfit mean-square values ranged from .73–1.10. For the Physical items, Infit mean-square values ranged from .87–1.23, and Outfit mean-square values ranged from .79–1.12. For the Language items, Infit mean-square values ranged from .82–1.24, and Outfit mean-square values ranged from .83–1.23. For the Cognitive items, Infit mean-square values ranged from .81–1.31, and Outfit mean-square values ranged from .78–1.32. For the Literacy and Mathematics items, Infit mean-square values ranged from .71–1.58, and Outfit mean-square values ranged from .69–1.52. Only two items had fit statistics that were out of the suggested acceptable range: 16a (Identifies and names letters) and 19a (Writes name). These results may suggest that it would be helpful during rater training to place greater emphasis on how to collect the appropriate artifacts to facilitate the determination of these ratings.
Item locations within each scale score analysis generally tended to match the distribution of person locations. The thresholds for the steps in the rating scale across items more than completely covered the full range of person locations, suggesting that each of the measures can be used to collect information that can discriminate between children at all levels of development.
Within each domain-specific Rasch analysis, the item location hierarchy appeared to be consistent with the expected developmental trajectory for children who are developing typically. These results can be interpreted as strong construct validity evidence for the scale scores. The rating scale scores indicate that the tool’s developmental indicators are presented in the sequence that matches the progressions of development and learning that the children in the sample are in fact following. In addition, each of the scale scores was moderately highly correlated with child age in months (r = .704 to .740). These results suggest that, while there is some expected variability in developmental levels for children of the same age, older children tend to receive higher ratings than younger children across all domains.
4 The Measurement Properties of the Teaching Strategies GOLD™ Assessment System
ReliabilityThe internal consistency reliability for each of the scale scores was high, with Cronbach’s alpha coefficients ranging from .961 to .986. These results, along with the Rasch reliability indexes, are reported in Table 4. Based on the Rasch reliability indexes, each of the five scales also appear to be highly reliable, as evidenced by person separation indexes of 3.330–7.270; person reliabilities of .920–.980; item separation indexes of 22.110–40.300; and item reliabilities of nearly 1.000 for all scales.
SummaryReliability statistics indicate that the Teaching Strategies GOLD™ assessment system is highly reliable. Factor analysis shows that the ratings load onto the constructs as intended by the tool-development team. Analyses of the dimensionality of each scale score suggest that the Teaching Strategies GOLD™ assessment system ratings measure five distinct domains of development and that each satisfies the Rasch model assumption of unidimensionality. The fit statistics suggest that the data are a good fit for the Rasch rating scale model. These results also strongly suggest that teachers are able to make valid ratings of the developmental progress of children across the intended age range, from birth through kindergarten.
Table 1 Participating Children by Age
Age (Months) n %
0–6 6 0.24
7–12 35 1.42
13–18 46 1.87
19–24 63 2.56
25–30 55 2.23
31–36 72 2.92
37–42 91 3.69
43–48 332 13.47
49–54 434 17.61
55–60 546 22.15
61–66 644 26.13
67–72 141 5.72
The Measurement Properties of the Teaching Strategies GOLD™ Assessment System 5The Measurement Properties of the Teaching Strategies GOLD™ Assessment System
Table 2 Structure Coefficients From Exploratory Factor Analysis
Domain of Development
Item CognitiveLiteracy
and MathSocial–
Emotional Physical Language
1a 0.822
1b 0.833
1c 0.641
2a 0.630
2b 0.795
2c 0.865
2d 0.800
3a 0.872
3b 0.690
4 0.484
5 0.491
6 0.413
7a 0.481
7b 0.418
8a 0.609
8b 0.469
9a 0.813
9b 0.863
9c 0.903
9d 0.766
10a 0.865
10b 0.646
11a 0.810
11b 0.893
11c 0.810
11d 0.899
11e 0.842
12a 0.700
12b 0.711
13 0.595
14a 0.652
14b 0.521
15a 0.654
15b 0.751
15c 0.809
16a 0.901
16b 0.994
6 The Measurement Properties of the Teaching Strategies GOLD™ Assessment System
Domain of Development
Item CognitiveLiteracy
and MathSocial–
Emotional Physical Language
17a 0.577
17b 0.765
18a 0.554
18b 0.675
18c 0.577
19a 0.524
19b 0.669
20a 0.649
20b 0.707
20c 0.835
21a 0.450
21b 0.573
22 0.633
23 0.602
Table 3 Rasch Principal Components Analysis of Residuals
Percent Variance Explained
Domain of Development
Rasch Rating Scale
Largest Secondary Dimension
Social–Emotional 85.9 3.5
Physical 87.3 5.2
Language 88.7 2.7
Cognitive 87.5 2.6
Literacy and Mathematics
83.2 2.5
Table 2 (continued)
The Measurement Properties of the Teaching Strategies GOLD™ Assessment System 7The Measurement Properties of the Teaching Strategies GOLD™ Assessment System
Table 4 Reliability Evidence and Correlation With Age by Scale Score
Domain of Development
Number of Items
Cronbach’s Alpha
Person Separation Index
Person Reliability
Item Separation Index
Item Reliability
Correlation With Age in Months
Social– Emotional
9 0.975 5.310 0.970 30.110 0.999 0.704
Physical 5 0.961 3.330 0.920 22.110 0.999 0.740
Language 8 0.976 5.630 0.970 23.530 0.999 0.718
Cognitive 10 0.983 6.720 0.980 22.130 0.999 0.730
Literacy and Mathematics
19 0.986 7.270 0.980 40.300 0.999 0.720
© 2010 Teaching Strategies, Inc. All Rights Reserved.