prioritizing growth but underutilizing growth scales ... · paper 1 title: modeling growth by...
Post on 04-Apr-2018
218 Views
Preview:
TRANSCRIPT
Prioritizing Growth But Underutilizing Growth Scales:
Implications of Advances in Growth Modeling for Educational Policy and Practice
Understanding and supporting growth in student achievement has garnered much attention
among practitioners, policymakers, and researchers. This interest is based in part on well-
documented shortcomings of only considering a student’s achievement at a point in time when
making decisions about students, teachers, and schools (Darling-Hammond, 2007; Duckworth,
Quinn, & Tsukayama, 2012; Kim & Sunderman, 2005).
Despite emphasis placed on student growth, educational practice still tends to rely primarily on
static achievement. For example, most states participating in the federal Race to the Top
competition identified the bottom 5% of schools and restructured many of those schools based on
achievement from a single point in time. Even when policies and practices do include estimates
of growth, they tend to rely on fairly coarse approaches like comparing separate cohorts of
students over time or standardizing test scores and comparing rank orderings between two
timepoints.
In this symposium, we introduce innovative growth modeling techniques to address several high-
profile issues in education policy and practice that revolve around making better inferences about
student learning using standardized achievement data. The initial paper in the symposium lays
out a new approach to modeling the seasonality of vertically-scaled achievement test data, which
often grows curvilinearly from fall to spring, but can drop off between spring and fall of the
following academic year. This Compound Polynomial (CP) model simultaneously estimates a
smooth growth curve over time and the jagged spring-to-fall drops. The remaining three papers
either apply the CP directly or use related growth modeling techniques to examine specific issues
of practice and policy. First, the CP model is used to estimate summer learning loss, a major
concern among policymakers. Most previous research on summer learning loss only compares
test scores from spring and fall, but does not account for growth trends outside of summer break.
Second, school value-added models are estimated using a student growth model, a very different
approach to quantifying school effectiveness than the traditional models that regress a post-test
score on a pre-test score (both standardized). This work shows that school effectiveness
estimates between the two approaches are quite different, and provides evidence that value-
added estimates can give educators useful information not always available when scale
properties are removed from the test through standardization or use of ordinal models. Third,
dynamic measurement modeling is used with vertically scaled longitudinal data to estimate a
student’s predicted capacity asymptote, which provide evidence on a student’s capacity to learn a
subject. Results suggest that impoverished students, despite having developed less mathematics
ability on average than their more privileged peers by 8th grade, nonetheless retain a practically
equal capacity for learning within that domain in the future.
Altogether, these papers show that using innovative methods in conjunction with vertically
scaled assessment data states and districts are currently collecting can generate inferences about
educational policies that are useful to educators and policymakers. These inferences cannot
typically be made under the common practice of standardizing test scores, using ordinal models,
or otherwise doing analyses that do not harness vertical scales.
Citations
Darling-Hammond, L. (2007). Race, inequality and educational accountability: The irony of ‘No
Child Left Behind.’ Race Ethnicity and Education, 10(3), 245–260.
Duckworth, A. L., Quinn, P. D., & Tsukayama, E. (2012). What No Child Left Behind leaves behind:
The roles of IQ and self-control in predicting standardized achievement test scores and report
card grades. Journal of Educational Psychology, 104(2), 439–451.
Kim, J. S., & Sunderman, G. L. (2005). Measuring Academic Proficiency under the No Child Left
behind Act: Implications for Educational Equity. Educational Researcher, 34(8), 3–13.
Paper 1
Title: Modeling Growth by Adding Curves: The Compound Polynomial for Seasonal Time-
series
Background / Context:
While there are many alternative growth functions are available for describing learning change
over time, the choice of a model will depend highly on key graphical features of a given set of
data. In this paper, we question the appropriateness of the conventional polynomial growth
curve, given as
2
0 1 2
p
i i i p i iy x x x (1)
for scoreiy observed at time
ix (of order p ) for assessment, or other, data that exhibit clear
seasonality or cycles.
An example of data series that exhibits a marked form of seasonality is the Measures of
Academic Progress, or MAP Growth®, an interim assessment which is offered by Northwest
Evaluation Association to elementary and high school students across the US. When we focus on
aggregate trends, we find patterns of mathematics and reading score means (in RITs or Rasch
Units) for a typical district, as shown in Figure 1 above. Notice the prominent pattern of mean
score drops from the Spring term of one grade to the Fall term of the next, the so-called “summer
drop-offs.” Growth appears as a chain of upward-tilting chevron curve segments from the lower
to the upper grade-levels.
Also shown are the fitted conventional polynomials (quadratic), indicating how inadequately
they navigate the observed data points. We focus on conventional polynomials not only because
they are familiar to analysts but they are often an effective option for many educational
applications. It is evident that, if conventional polynomials are employed for the above data, the
drawbacks are familiar: inefficiencies of estimates due to serially-correlated errors, hard higher-
order model terms that are hard to interpret, and poor data-model fit with clearly discernable
prediction bias.
Purpose:
First introduced in Thum and Hauser (2015) for developing growth norms for MAP Growth
assessments, the compound polynomial, or CP, has been employed in other growth modeling
MAP Growth data by Thum and Matta (2015a, 2015b, 2016) and by Thum and Soland (2017). In
this paper, we present the basic principle of the CP and, with the help of selected analyses from
past applications, show that the CP is to be preferred when compared with the conventional
polynomial.
While the original motivation, and discovery, of the CP comes from a successful attempt to fit
NWEA MAP Growth data better, it turns out that the effect of adding “primary curves” to
achieve a more desirable “overall curve” has a long history, including Bernoulli’s study of
harmonics and Thomas Young’s attempt to give a better account of the interference of light and
sound waves toward the latter half of the 18th century (Kipnis, 1991). Figure 2 depicts the
addition of y1, y2, and y3, three sound waves of varying amplitudes, frequencies, and phases.
Note how the compound wave departs starkly from the general character of each of its
contributors.
Setting / Population / Participants:
This study employs student mathematics RIT scores for the Northwest Evaluation Association
MAP Growth® assessment, a computerized adaptive test that is used by over 6,000 districts
across the US. All the available fall, winter, and spring MAP mathematics RIT scores (592,305
in total) for a random sample of 1443 schools serving 130,077 students who attended grades 3, 4,
and 5.
Method / Research Design:
To describe seasonal or cyclical data series, such as that depicted in Figure 1, we employ a
weighted sum of several cross-segment polynomial functions. Each component describes how a
within-segment polynomial coefficient changes over segments. For the general case with
1,2, ,g G segments, gn observations in segment g , and a within-segment polynomial P of
degree min( )gK n , the CP is given by
1 1 2 2 ,G k k K K I P Q u Q u Q u Q u (2)
where kQ is the between-segment polynomial model for the k th within-segment term and
ku is a
1K null vector with “1” in row 1 k K and 1,2, ,k K . Modifications to Equation 2 result
to trace primary effects such as Spring-Fall summer loss, to accommodate a centering of time, or
to vary the inter-point spacing of time (to reflect varying number of instructional weeks, for
example), are minimal and they will be detailed in the completed paper.
Returning to the earlier motivating example from MAP Growth, what are added together in a CP
(see in Figure 3) are a polynomial capturing the change in Fall mean scores over grade-levels
(y1, left vertical axis) and a second polynomial describing the change of Fall-Spring gains over
grade-levels (y2, right vertical axis). The result of adding two continuous curves is a surprising
near-discontinuous growth curve that better describes the summer loss typically observed in
interim assessments when compared with the conventional polynomial growth curve.
Findings / Results:
Table 1 displays MAP Growth mathematics means and instructional days for Fall, Winter, and
Spring terms in grades 4, 5, and 6 to be employed in our illustrative example. Table 2 displays
the estimation results for a CP model with those from a regular polynomial model describing the
change in mathematics score means over 10 consecutive terms from the Fall of grade 3 through
Fall of grade 6.
We find that a null model shows a strong sample first-order auto-correlation of 0.612 . We also
find that the best-fitting regular polynomial model is one that is linear in time, with a residual
sum-of-squares of 59.317 and an auto-correlation estimate of 0.320 . In contrast, the best-fitting
CP model indicates that Fall results are quadratic over grade-levels and that the within-grade
trend is linear and constant over grades 3 to 6. The residual sum-of-squares for the CP model is
much lower at 8.311. Equally important, as the auto-correlation estimate of 0.042 indicates,
residuals are no longer correlated. Figure 4 displays the observed data, predicted results, and the
residuals for this illustration.
Conclusions:
We present a new class of growth curves that is obtained by the weighted sum of cross-segment
polynomial functions, defined for each segment-specific polynomial coefficient. We show that
CP curves are surprisingly flexible in form and thus they provide a better fit to seasonal or
cyclical data. With the help several examples, we also show how clinically useful parameters,
i.e., they convey key and meaningful aspects of growth, may be obtained.
Paper 1 Appendix A.
References
Kipnis, N. (1991). History of the Principle of Interference of Light. Basel, Boston and Berlin:
Birkhauser Verlag.
Thum Y. M., & Hauser, C. H. (2015). NWEA 2015 MAP Norms for Student and School
Achievement Status and Growth. NWEA Research Report, Portland, OR: NWEA.
Thum, Y.M., & Matta, T. (2015a, May). Predicting College Readiness from Interim Assessment
Results: Selection Modeling for Longitudinal Data. Paper presented at the Modern
Modeling Methods (M3) Conference, Neag School of Education, University of
Connecticut, CT.
Thum Y. M., & Matta, T. (2015b). MAP College Readiness Benchmarks: A Research Brief.
NWEA Research Report. Portland, OR: NWEA.
Thum, Y. M., & Matta, T. (2016, April). Fitting Curves to Data Series with Seasonality using the
Additive Polynomial (AP) Growth Model. Presented at the Annual Meeting of the
National Council on Measurement in Education, Washington, DC.
Thum, Y. M., & Soland, J. (2017, March). School Norms for Mathematics Achievement Status,
Term-to-term Growth, and the Gender Gap. Paper presented at the SREE 2017 Spring
Conference, Washington, DC.
Paper 1 Appendix B. Tables and Figures
Table 1. Grade 4 MAP Mathematics Means (Midwestern State) and the
Number of Instructional Days by School Year and Term
School
Year Term
Avg.
RIT Days
2011 F 192 15
2011 W 199 81
2011 S 206 155
2012 F 204 192
2012 W 210 259
2012 S 216 332
2013 F 213 368
2013 W 218 438
2013 S 225 511
2014 F 221 547
Table 2. Comparing a conventional polynomial with a compound (or additive)
polynomial: Grade 4 MAP Mathematics Means (Midwestern State).
Conventional Polynomial* (error
DF=8) Est. s.e.
Intercept (4th Grade Fall) 205.550 0.973
Linear 0.279 0.025
Residual SS 59.317
Durbin-Watson d 2.262
Sample 1st-order Auto-correlation -0.320 Compound Polynomial (error
DF=6)
Est. s.e.
Within-
Grade Between Grade
Fall Status Intercept (4th
Grade) 204.022 0.343
Fall Status Linear 10.099 0.223
Fall Status Quadratic -0.478 0.206
Linear Intercept (4th
Grade) 0.464 0.016
Residual SS 8.311
Durbin-Watson d 1.841
Sample 1st-order Auto-correlation 0.042
* 10 data points, ρ=0.612
Figure 1. Pattern of MAP Growth mathematics and reading means by grade and term
(F=“Fall”, S=“Spring”) for a district. Dashed lines are quadratic curves for each series.
Figure 2. The impact of adding three sound waves.
-6
-5
-4
-3
-2
-1
0
1
2
3
4
-12 -10 -8 -6 -4 -2 0 2 4 6 8y
x
Adding 3 Sine Curves of Different Amplitudes, Frequencies, and Phases
y1=1.4*SIN(2*x+0) y2=2*SIN(1.2*x+60) y3=2.2*SIN(3*x+40) y1+y2+y3
Figure 3. A compound polynomial for generic MAP Growth data.
Figure 4. Illustrative example comparing the compound (or additive) polynomial to a
conventional polynomial.
0
1
2
3
4
5
6
7
8
190
195
200
205
210
215
220
225
230
235
240
-7 -6 -5 -4 -3 -2 -1 0 1 2
Fall-Sp
ring
Slo
pe
RIT
Time
Adding 2 Regular Polynomial Curves
y1=230+2*Time-0.7*Time^2+0.02*Time^3 y1+y2 y2=0.8-0.2*Time+0.1*Time^2-0.02*Time^3
Paper 2
Title: Summer Learning Loss and Student Learning Trajectories
Background / Context:
The question of whether student learning is negatively impacted by summer vacation has
been of interest for researchers for a long time (Cooper, Nye, Charlton, Lindsay, & Greathouse,
1996; Phillips, Crouse, & Ralph, 1998). Of particular interest is whether summer learning rates
differ by student characteristics, such as socioeconomic status (SES) or race/ethnicity, which
could contribute to inequalities in academic trajectories (Quinn, Cooc, McIntyre, Gomez, 2015).
Researchers have typically used fairly basic standardized statistics or regression models with two
time points (fall and spring) to obtain population estimates or group differences in summer
learning. These estimates provide a broad overview of summer learning patterns at an aggregate
level, but potentially mask a great deal of variability across students and do not provide
meaningful information regarding the degree to which experience summer learning loss is
associated with students’ academic trajectories.
Purpose:
The purpose of this paper is to embed the study of summer learning loss in a longitudinal
analysis of student academic growth across school years. We estimate the variability in students’
summer learning across individuals, as well as the association between summer learning and
learning rates across the school years (e.g., growth in reading and math across elementary and
middle school). We also will explore whether minority students are more likely to experience
summer learning loss than non-minority students.
Setting:
The data for the current study comes the Measures of Academic Progress (MAP)
assessment, which is administered to school age students across the U.S. The MAP is a computer
adaptive test that assesses student performance in math and reading, and is administered multiple
times per school year, typically in the fall and spring. Test scores are reported in an IRT-based
metric, which is equal-interval scaled allowing for measures of growth across grades.
Population / Participants / Subjects:
The dataset includes a cohort of students enrolled in public schools who took the MAP
assessment between 2004-2008 in a single school district in a midwestern state. We follow
students in this district across a five-year longitudinal pattern (fourth through eighth grade, with
MAP assessments in the fall and spring of each school year). Table 1 presents sample sizes by
grade and year of data. Our analyses include 3,693 students for whom we observe math and
reading achievements scores from fourth through eighth grade.
Research Design:
The goal of the study is to obtain estimates of student score drops/increases over the
summer as well as patterns of growth across the elementary/middle school years. We estimated
overall trajectories and summer learning patterns using a growth curve model that accounts for
the seasonality of student assessment data. In this model, we treat the fall and spring MAP test
scores (level 1) as nested within students (level 2). A compound polynomial model, which is
described in greater detail in the first paper of this symposium as well as in Thum & Matta
(2015; 2016), was specified for this study. The compound polynomial set-up allows for the
simultaneous estimation of overall learning trajectory from fall of fourth to fall of eighth grade
and the average rate of spring-to-fall (summer) growth across grades. Time is centered in the
model so that fall of fifth grade is the intercept. In the multilevel set-up, we can also include
person-level predictors such as gender and race/ethnicity to better understand how both summer
and within-school learning rates vary across students. The equations for the growth model are
provided in Appendix B. The lmer package in R was used to estimate the longitudinal
hierarchical linear models (Bates, Maechler, Bolker, Walker, 2015).
MAP testing schedules are set by users and test dates typically vary between 6 to 8 weeks
in each testing season. To better relate learning to the amount of instruction students received,
we employ an estimate of the allotted instructional time (in weeks) based on a database of
NWEA partner district calendars instead of test dates as the time scale for describing
achievement growth (Thum & Hauser, 2015).
Findings / Results:
We have conducted preliminary analyses using the math MAP assessment following one
cohort of students as they move from fourth through eighth grade. Table 2 presents the
regression coefficients from the compound polynomial model. To better represent the meaning
of these coefficients, Figure 1 shows the predicted overall trajectory as well as the separate fall-
to-fall growth (red line) and spring-to-fall (green line) components of the compound polynomial
model. The intercept term represents the fall status in 5th grade (208.9), and the estimated fall-to-
fall slope across grade-levels is 9.55 (slope of the red curve). The estimated spring-to-fall
(summer) score differences are displayed as a separate green line. The fourth model coefficient
(β3̂) represents the difference between spring and fall, where a negative value indicates the fall
score is higher (e.g., summer learning), while a positive value indicates fall is lower than spring
(e.g., summer drop). In the summer before 5th grade, students increased an average of 1.34 points
(β3̂ = −1.34). Across grade levels, the spring-to-fall gains decrease and become an average
summer loss by the summer before eighth grade.
Figure 2 displays the predicted trajectory as well as individual trajectories for a random
subset of students. It is clear that the overall trajectory seen in Figure 1 masks a great deal of
individual variability seen in the lines in the background. While many students showed increases,
a fair number of students displayed serious loss in MAP scores over the summer periods. More
analyses will be conducted to understand how student and school characteristics explain this
variability.
Conclusions:
Advanced modeling techniques such as the compound polynomial model can shed light
on average summer learning patterns as well as characteristics of students and schools that are
associated with summer test score drop offs. By identifying the groups of students and grade
levels in which summer learning loss is most often observed, policies can be better targeted to
alleviate gaps for the most vulnerable students.
Paper 2 Appendix A.
References
Bates, D., Maechler, M. Bolker, B., Walker, S. (2015). Fitting linear mixed-effects models using
lme4. Journal of Statistical Software, 67(1), 1-48.
Cooper, H., Nye, B., Charlton, K., Lindsay, J., & Greathouse, S. (1996). The effects of summer
vacation on achievement test scores: A narrative and meta-analytic review. Review of
Educational Research, 66 (3), 227.
Phillips, M., Crouse, J., & Ralph, J. (1998). Does the Black-White test score gap widen after
children enter school. The Black-White Test Score Gap, 229–272. Washington, DC:
Brookings Institution Press.
Quinn, D. M., Cooc, N., McIntyre, J., & Gomez, C. J. (2016). Seasonal dynamics of academic
achievement inequality by socioeconomic status and race/ethnicity: Updating and
extending past research with new national data. Educational Researcher, 45(8), 443–453.
Thum Y. M., & Hauser, C. H. (2015). NWEA 2015 MAP Norms for Student and School
Achievement Status and Growth. NWEA Research Report. Portland, OR: NWEA
Thum, Y.M., & Matta, T. (2015, May). Predicting College Readiness from Interim Assessment
Results: Selection Modeling for Longitudinal Data. Paper presented at the Modern
Modeling Methods (M3) Conference, Neag School of Education, University of
Connecticut, CT.
Thum, Y. M., & Matta, T. H. (2016, April). Fitting Curves to Data Series with Seasonality using
the Additive Polynomial (AP) Growth Model. Presented at the Annual Meeting of the
National Council on Measurement in Education, Washington, DC.
Paper 2 Appendix B. Tables and Figures and Equations
Table 1. Number of unique Students, sample MAP score mean, and compound polynomial
predictive value by school term
Term Grade Sample size Sample mean Predicted value
2004 Fall 4 260 196.77 198.48
2004 Spring 4 329 203.32 207.52
2005 Fall 5 1550 208.07 208.87
2005 Spring 5 1714 217.03 216.68
2006 Fall 6 3022 220.01 217.55
2006 Spring 6 3029 225.94 224.14
2007 Fall 7 3035 225.13 224.52
2007 Spring 7 3011 230.31 229.88
2008 Fall 8 2979 231.20 229.78
2008 Spring 8 2967 235.65 233.92
Table 2. Coefficients from the compound polynomial model
Fixed effects
Term Estimate Std. Error t-value
Fall - status (5th grade) 208.87 0.26 790.50
Fall linear growth 9.55 0.12 77.80
Fall quadratic growth -0.86 0.03 -25.50
Spring-to-fall difference (4th-5th grade) -1.34 0.21 -6.50
Spring-to-fall linear growth 0.47 0.09 5.20
Variance components
Term Variance Std. Dev.
Fall - status (5th grade) 207.49 14.40 Fall linear growth 2.69 1.64 Spring-fall gap (4th-5th grade) 0.45 0.67 Residual 28.85 5.37
Figure 1. Overall trajectory, fall-to-fall, and spring-to-fall components from the compound
polynomial model
Note. The black line with diamonds represents the predicted trajectory of scores estimated by the compound
polynomial model. For demonstration, we also split up the terms of the model and plotted the separated components
to clarify the structure of the model. The red line represents the first half of model (β0̂X0 + β1̂X1 + β2̂X2) that
describes the change in fall status across grade-level. The green line (on the scale shown on the right axis) represents
the change over time in the spring-to-fall gaps (β3̂X3 + β4̂X4).
Figure 2. Overall trajectory and individual trajectories of MAP data across grades 4-8
Note. The black line with large circles represents the predicted trajectory for the sample. The colored lines represent
individual observed trajectories for a random subsample of the cohort. The fall-to-spring growth is represented by
dashed lines while the summer (spring-to-fall) growth is represented by solid lines.
Equation 1. Basic Structure of the Compound Polynomial Hierarchical Linear Model
Level-1 Model (Repeated observations of MAP scores (t) within students (i)
MAP𝑡𝑖 = ∑ 𝛽𝑘𝑖X𝑘𝑡𝑖
4
𝑘=0
+ 𝑒𝑡𝑖
Level-2 Model (students (i))
𝛽0𝑖 = 𝛾00 + 𝑣00𝑖 , where γ00 is the predicted Fall score at 5th grade
𝛽1𝑖 = 𝛾10 + 𝑣10𝑖, where γ10 is the linear growth rate of change of Fall scores 𝛽2𝑖 = 𝛾20, where γ20 is the quadratic growth rate of change of Fall scores
𝛽3𝑖 = 𝛾30 + 𝑣30𝑖 , where γ30 is the predicted Spring − Fall difference in 5th grade
𝛽4𝑖 = 𝛾40, where γ40 is the linear growth rate of change for the Spring − Fall diff.
A simplified version of the X𝑖 coding for an individual with all observed timepoints is presented
below:
Grade/Term X0 (Int.) X1 (Fall
linear
slope)
X2 (Fall
quadratic
slope)
X3
(Spring-
Fall Int.)
X4
(Spring-
Fall linear
slope)
4th Fall 1 -1 1 0 0
4th Spring 1 0 0 1 0
5th Fall 1 0 0 0 0
5th Spring 1 1 1 1 1
6th Fall 1 1 1 0 0
6th Spring 1 2 4 1 2
7th Fall 1 2 4 0 0
7th Spring 1 3 9 1 3
8th Fall 1 3 9 0 0
8th Spring 1 4 16 1 4
Note. The first three terms (X0 − X2) represent a standard quadratic growth model estimating
change in fall status from 4th-8th grade, with time centered at 5th grade. The second set of terms
(X3 − X4) represent the part of the model where spring-to-fall gaps are estimated. Since time is
centered around 5th grade fall, X3 represents the predicted difference in scores between 4th grade
spring and 5th grade fall, and X4 represents the change in the estimated gap across grade-levels.
Paper 3
Title: Estimating School Value-Added Using a Student Growth Model: Implications for
Practice and Policy
Background/ Purpose:
Teacher and school value added models (VAM) and estimates are best known for their
uses in high-stakes accountability systems. In practice, some districts and states have used VAM
estimates to remediate or terminate extremely low-performing teachers, including the District of
Columbia (Dee & Wyckoff, 2015). Under the federal Race to the Top competition, many states
chose to identify the bottom 5% of schools and implement comprehensive reforms in those
schools (Baker, Oluwole, & Green, 2013). In both cases, consequences for low-performing
teachers and schools can be severe.
Most of the VAMs used in research and practice regress the student’s post-test score on a
pre-test score, covariates, and a teacher or school fixed or random effect. Oftentimes, these
scores are standardized to have a mean of zero and variance of one or are treated as ordinal,
essentially stripping out the psychometric properties of the scale. These models can shed light
on whether teachers or schools contribute to re-ordering of students between two time points.
Decisions to coarsen the data are often made due to a growing literature on the practical
consequences of wrongly assuming an interval scale, the implications of which are great when
VAM estimates are used in high-stakes accountability regimes (Briggs & Domingue, 2012;
Briggs & Weeks, 2009; Soland, 2017).
Purpose:
One might argue a question much closer to the policy intent behind VAMs is whether
teachers or schools contribute to the learning gains of students over the course of their time in K-
12 schools. That is, rather than look at rankings of students, models could instead be developed
to quantify teacher and school contributions to how students develop as learners. In this paper,
we fit VAMs using a baseline student growth model that better matches this second concept of
long-term learning, compare estimates of school effectiveness to ones from more traditional
VAMs, and highlight inferences that can be made from VAMs that are layered on to student
growth models. Ultimately, this study explores what might come from inverting the normal
scholarly dialog on VAM estimates conducted in the literature: rather than ask what invalid and
improper inferences might be drawn from VAM estimates in a high-stakes context, we ask what
useful inferences may be drawn from thinking of VAMs as low-stakes tools for educators.
Specifically, we ask two research questions:
1. How different are school-level VAM estimates produced by traditional models versus
those that use a vertical scale to estimate a student growth model with school random
effects?
2. What are examples of inferences that can be drawn from estimates of school quality
based on an underlying student growth model that are useful to educators and cannot
necessarily be made using an ordinal scale?
Population / Participants / Subjects:
The analytical sample consists of data from a Southern state where the majority of
students take MAP Growth, an interim assessment with a vertical scale. Three years of data are
used with students beginning in fall of 6th grade and ending in spring of 8th grade. MAP
Growth is often administered three times per year in the fall, winter, and spring, though winter
administrations of the test are less frequent. Table 1 shows counts of students and other relevant
demographic information by time period and term. As the table makes clear, while the data
follow students over time, we do not use an intact cohort design: students can enter and exit the
sample at any time so long as they attended a school in the state during 8th grade. For the
purposes of estimating VAM, each student is associated with the school they attended in fall of
8th grade (total of 266 schools overall).
Methods & Research Design:
We fit two VAMs, then compare estimates from those models. Models are estimated
separately in math and reading. The first VAM is the traditional model estimated for student i in
school j for time t and test score 𝑌𝑡𝑖𝑗.
𝑌𝑡𝑖𝑗 = 𝛽0 + 𝛽1𝑌𝑡−4,𝑖𝑗 + 𝑿𝑖𝑗𝑠𝑡𝛿 + 𝛾𝑠 + 휀𝑖𝑗𝑠𝑡
Here, 𝑿𝑖𝑗𝑠𝑡 is a matrix of student- and school-level covariates and 𝛾𝑠 is a school random effect.
Scores from spring of 8th grade are regressed on scores from spring of 7th grade, both
standardized within time period to have mean of zero and unit variance.
By comparison, a VAM that models individual student growth is also fit such that:
Level 1
𝑌𝑡𝑖𝑗 = 𝜋0𝑖𝑗 + 𝜋1𝑖𝑗𝑡𝑖𝑚𝑒𝑡𝑖𝑗 + 𝜋2𝑖𝑗𝑡𝑖𝑚𝑒𝑡𝑖𝑗2 + 𝜋3𝑖𝑗𝑡𝑖𝑚𝑒𝑡𝑖𝑗
3 + 휀𝑡𝑖𝑗
Level 2
𝜋0𝑖𝑗 = 𝛽00𝑗 + 𝑟0𝑖𝑗
𝜋1𝑖𝑗 = 𝛽10𝑗 + 𝑟1𝑖𝑗
𝜋2𝑖𝑗 = 𝛽20𝑗
𝜋3𝑖𝑗 = 𝛽30𝑗
Level 3
𝛽00𝑗 = 𝛾000 + 𝑢00𝑗
𝛽10𝑗 = 𝛾100 + 𝑢10𝑗
𝛽20𝑗 = 𝛾200
𝛽30𝑗 = 𝛾300
With covariance structure
(𝑢00𝑗
𝑢10𝑗) ~ 𝑁 [(
00
) , (𝜏00
2 𝜏01
𝜏10 𝜏112 )]
(𝑟0𝑖𝑗
𝑟1𝑖𝑗) ~ 𝑁 [(
00
) , (𝜑00
2 𝜑01
𝜑10 𝜑112 )]
휀𝑡𝑖𝑗 ~ 𝑁(0, σ𝜀2)
This formulation includes a student growth model with a random student intercept and slope on
time. It also includes school random intercepts and random slopes on time.
Findings/Results:
Table 2 shows crosstabulations of traditional and growth VAM estimates by quintile of
their respective distributions. Incorporating student growth into the model changes
classifications of students moderately. For example, 13% of schools in the bottom two quintiles
based on traditional estimates would be in the top two quintiles using a growth model. Table 3
shows a similar crosstabulation, this time of estimated school-level achievement at time 0 and
mean within-school growth across time periods. This table suggests that some schools ranked
high or low in terms of baseline achievement would be ranked quite differently based on
estimated growth rates.
Conclusions:
Results suggest that VAMs can be used to help understand long-term contributions to
student learning. This approach is categorically different to that taken in most VAMs in practice,
which examine school contributions to changes in rank orderings of students between two
timepoints. This finding suggests that VAM estimates have untapped potential as data to inform
practice, a potential that is hard to realize when estimates are also used for accountability
purposes, which often necessitates more cautious approaches like ordinal models.
Paper 3 Appendix A.
References
Baker, Bruce D. and Oluwole, Joseph and Green, Preston C., The Legal Consequences of
Mandating High Stakes Decisions Based on Low Quality Information: Teacher
Evaluation in the Race-to-the-Top Era (January 28, 2013). Education Evaluation and
Policy Analysis, 21(1).
Briggs, D. C., & Domingue, B. (2013). The gains from vertical scaling. Journal of Educational
and Behavioral Statistics, 38(6), 551-576.
Briggs, D. C., & Weeks, J. P. (2009). The sensitivity of value-added modeling to the creation of
a vertical score scale. Education, 4(4), 384–414.
Dee, T. S., & Wyckoff, J. (2015). Incentives, selection, and teacher performance: Evidence from
IMPACT. Journal of Policy Analysis and Management, 34(2), 267–297.
Soland, J. (2017). Is Teacher Value Added a Matter of Scale? The Practical Consequences of
Treating an Ordinal Scale as Interval for Estimation of Teacher Effects. Applied
Measurement in Education, 30(1), 52–70.
Paper 3 Appendix B. Tables and Figures
Table 2
Cross-tabulations of Traditional and Growth VAM Estimates by Quintile
Growth VAM
1 2 3 4 5
1 28 14 8 3 1 54
2 17 16 9 9 2 53
3 6 12 10 16 9 53
4 3 8 16 13 13 53
5 0 4 9 13 27 53
Total 54 54 52 54 52 266
Traditional VAM
Table 3
Cross-tabulations of Achievement at Time 0 and Mean within-School Growth Rate by Quintile
Achievement Time 0
1 2 3 4 5
1 37 8 3 5 0 53
2 6 29 14 1 1 51
3 4 7 24 16 1 52
4 3 6 9 21 13 52
5 0 3 1 10 38 52
Total 50 53 51 53 53 260
Mean Growth
Table 1
Descriptive Statistics for Analytical Sample by Time Period
Time Term Grades Students Prop. Black Prop. Hisp. Prop. Male
0 Fall 2011 6 43,873 0.355 0.061 0.506
1 Winter 2012 6 21,033 0.388 0.059 0.510
2 Spring 2012 6 42,151 0.356 0.060 0.505
3 Fall 2012 7 41,824 0.356 0.060 0.504
4 Winter 2013 7 20,772 0.384 0.056 0.508
5 Spring 2013 7 40,386 0.358 0.061 0.502
6 Fall 2013 8 37,180 0.362 0.062 0.502
7 Winter 2014 8 15,605 0.439 0.065 0.507
8 Spring 2014 8 35,998 0.362 0.062 0.502
Paper 4
Title: Dynamic Measurement Modeling: Using Nonlinear Growth Models to Estimate Student
Learning Capacity
Background and Purpose:
Psychometric assessments, as they have traditionally been applied in the educational
setting, solely measure abilities and skills that students have developed prior to the occasion of
testing, and consequently cannot tap a student’s capacity for developing those abilities in the
future (Sternberg et al., 2002). Despite this recognized disconnect between developed abilities
and developing capacities, scores on single-time-point educational or psychological measures are
all-too-often misinterpreted as relating to student potential. For this reason, students who may
not have had adequate opportunity to develop a given ability—and therefore score poorly on a
performance assessment—may be officially judged as not having the capacity for developing
that ability, and as such may not be given the resources and attention they need from educators in
order to meet their potential (Lohman, 1999).
One methodology that has been historically utilized to address this problem is dynamic
assessment (DA; Feuerstein, 1979; Tzuriel, 2001). Because DA features multiple testing
occasions, integrated with instruction by a clinician, it is capable of estimating a student’s
capacity for developing a particular skill or ability. Please see Figure 1 for a graphical depiction
of the theoretical conceptualization of student ability growth and predicted learning capacity that
is adapted from the DA literature. Unfortunately, because widely applying DA in any
educational system would entail substantial time investment by trained clinicians, the monetary
requirements of such extensive application are beyond that currently available to most state
systems, school districts, and educational research groups.
However, recent advances in non-linear growth modeling and statistical computing, as
well as the proliferation of reliable longitudinal data pertaining to the educational achievement of
U.S. students, offer an alternative solution. Specifically, a new psychometric modeling
framework—Dynamic Measurement Modeling (DMM)—is now capable of accomplishing many
of the goals of DA through the modeling of longitudinal testing data, without the need for
extensive one-on-one clinical work (Dumas & McNeish, 2017; McNeish & Dumas, 2017).
In general, DMM utilizes vertically scaled longitudinal data to estimate subject-specific
random effects for a number of growth parameters associated with learning, including each
students’ predicted capacity asymptote. A major motivation for the creation of DMM was to
quantitatively produce estimates of student learning capacity that are relatively free from the
undue influence of student demographic characteristics such as socioeconomic status (SES),
race/ethnicity, and gender. However, prior to the completion of the current study, the actual
efficacy of DMM to accomplish this goal had not been formally tested. Therefore, we here
conduct and present just such an investigation.
Setting:
To address this critical research question, we utilized the Early Childhood Longitudinal
Survey- Kindergarten (ECLS-K) 1999 cohort (Tourangeau, et al., 2009). These publicly
available data were collected at seven time-points: fall and spring of kindergarten, fall and spring
of Grade 1, spring of Grade 3, spring of Grade 5, and spring of Grade 8. In this analysis, we
utilized mathematics assessment scale scores (not individual items), which were vertically scaled
across time-points.
Research Design:
To these data, we fit a DMM capable of modeling the growth trajectory of every
individual student in the dataset. Please see Figure 2 for the growth trajectories and capacity
asymptotes of 50 randomly selected ECLS-K participants. This model, termed the “full model”
included data spanning from kindergarten through eighth grade. Another model, termed the
“reduced model” was fit to data spanning only through grade 5. All subject-specific estimates for
both of these models were saved for later analysis.
Findings/Results:
Reliability and Convergent Validity. In order to ascertain the reliability of DMM
capacity estimates, we correlated the subject-specific capacity estimates from the full and
reduced model. The resulting value of r = .934 indicated that the asymptotic capacity predictions
were very stably estimated across models. In fact, this correlation exceeds the correlation of r =
.836 found between Grade 5 and Grade 8 mathematics scores by a comfortable margin. The
correlations between capacity estimates and scale scores are also moderately high (range: .679 to
.771) which appears to provide satisfactory convergent validity evidence, suggesting that student
capacities are positively related to single-time-point assessments, but not so strongly related so as
to suggest they are synonymous.
Consequential Validity. In a sequence of general linear models (GLMs) we tested the
effect of student demographic characteristics (i.e., race/ethnicity, SES, and gender) on single-
time-point assessments as well as DMM capacity estimates. In this analysis, SES was quantified
through a principal components analysis of a variety of student background variables present in
the ECLS-K data. All GLM analysis was conducted on the “full model” DMM results that
included ECLS-K data from kindergarten through eighth grade.
Figure 3 shows a plot of the GLM omnibus R2 values. Note that for the ECLS-K
mathematics scores, the omnibus R2 values fall between 15.8% and 22.8%. On the other hand,
the R2 value for capacity is 9.9%: approximately half that of the ECLS-K score GLMs. So, the
percentage is noticeably reduced compared to the single-time-point scores.
W present the effect sizes related to SES in Figure 4. Effect sizes depicted in Figure 4
are Cohen’s f, which fall on the following scale: .10, .25, and .40 for small, medium and large
effects, respectively (Cohen, 1992). Effect sizes below .10 are considered negligible. The effect
of SES on ECLS-K mathematics scale scores would be classified on the high side of a small
effect, at times approaching a medium effect. However, the effect size for SES on capacity is
noticeably smaller than each of the scale scores and is short of the small effect cut-off (i.e., it is
negligible) by a reasonable margin.
Conclusions:
In our view, this finding implies that impoverished students, despite having developed
less mathematics ability on average than their more privileged peers by 8th grade, nonetheless
retain a practically equal capacity for learning within that domain in the future. This type of
conclusion is not readily attainable with most other available types of psychometric or
longitudinal methods. Therefore, we argue that DMMs hold substantial promise for informing
measurement practice and educational research.
Paper 4. Appendix A.
References
Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155–159.
Dumas, D., McNeish, D. (2017). Dynamic measurement modeling: Using nonlinear growth
models to estimate student learning capacity. Educational Researcher, 46(6), 284-292.
DOI: 10.3102/0013189X17725747
Feuerstein, R., Rand, Y., & Hoffman, M. B. (1979). The dynamic assessment of retarded
performers: the learning potential assessment device, theory, instruments, and techniques.
Baltimore: University Park Press.
Lohman, D. F. (1999). Minding our p's and q's: On finding relationships between learning and
intelligence. Learning and individual differences: Process, trait, and content
determinants, 55-76
McNeish, D. & Dumas, D. (2017). Non-linear growth models as measurement models: A
second-order growth curve model for measuring potential. Multivariate Behavioral
Research, 52 (1), 61-85.
Sternberg, R. J., Grigorenko, E. L., Ngorosho, D., Tantufuye, E., Mbise, A., Nokes, C., ... &
Bundy, D. A. (2002). Assessing intellectual potential in rural Tanzanian school children
Intelligence, 30, 141-162.
Tourangeau, K., Nord, C., Lê, T., Sorongon, A. G., & Najarian, M. (2009). Early Childhood
Longitudinal Study, Kindergarten Class of 1998-99 (ECLS-K): Combined user's manual for the
ECLS-K eighth-grade and k-8 full sample data files and electronic codebooks. NCES
2009-004. Washington: National Center for Education Statistics.
Tzuriel, D. (2001). Dynamic assessment of young children. New York: Kluwer Academic.
Paper 4 Appendix B. Tables and Figures
Figure 1. Theoretical depiction of the dynamic assessment process. The space below the line
is realized ability, the space above the line is unrealized availability, and the horizontal line
at the top is the capacity.
0
50
100
150
200
250
300
350
0 2 4 6 8
Abil
ity S
core
Elapsed Time
Availability
Ability
Capacity
↓
Figure 2. Ability, Capacity and Availability trajectory plots for two random samples of 25
students from the ECLS-K dataset, with superimposed sample mean trajectory (bold)
0
50
100
150
200
250
300
0 2 4 6 8
Pre
dic
ted
Mat
hem
atic
s S
core
Years After Kindergarten Fall
0
50
100
150
200
250
300
0 2 4 6 8
Pre
dic
ted M
athem
atic
s S
core
Years After Kindergarten Fall
Figure 3. Plot of omnibus R2 values showing the amount of total variation explained in
Scale Scores and Full Model Capacity estimates by gender, race/ethnicity, SES, and all two
and three-way interactions
Figure 4. The effect size (Cohen’s f) of SES on ECLS-K scale scores and Full Model
capacity estimates. The dashed horizontal line at .10 represents the cut-off for a “small”
effect, the dashed line at .25 represents the cut-off for a “medium” effect.
.189.181
.158
.178
.202
.228 .227
.099
0.00
0.05
0.10
0.15
0.20
0.25
Om
nib
us
R2
Mathematics Score
.237
.222
.196
.163
.196 .193.204
.071
0.00
0.05
0.10
0.15
0.20
0.25
0.30
Cohen
's f
Eff
ect
Siz
e
Mathematics Score
top related