looming, plotting, and animating historical time series ... · assessment practices that align...

Looming, Plotting, and Animating Historical Time Series Measures of Direct and Indirect Assessments of Student Learning Outcomes from Multiple

Credential Programs Using R and R-Studio

Chris Boosalis Sacramento States, United States

[email protected]

Oddmund Myhre California State University, Stanislaus, United States

[email protected] Abstract Collecting direct and indirect measures of student learning outcomes is a staple of assessment and accreditation activities, but the problem is that the data are often most meaningful at the program-level but least meaningful at the aggregated department and unit levels. For example, faculty often review student survey responses (indirect measures) and performance assessments (direct measures) on a given outcome and derive a useful indication of how students perceive their own performance against an objective measure of their actual performance. Complexity interferes with this straightforward process of interpretation when separate programs and their measures become compiled at the department and division/college levels. This paper proposes a workable solution to the problem of aggregation, equating measures, and interpreting the results of direct and indirect assessments of student learning outcomes through time-series plotting using the open-source statistical program R and R-Studio. Introduction In accreditation, the expectation that colleges and schools of education will collect and analyze data gathered from learning-outcome assessments at the program, department, and unit-levels is a common practice (National Institute for Learning Outcomes Assessment, 2014). All credential programs in California, for example, must write to the Common Standards as established by the California Commission on Teacher Credentialing (2015). California credential programs must align measures against these standards and demonstrate that data collection and data-driven decisions are a part of regular practice. Standard Four, the Continuous Improvement standard, lends an example us an example:

The education unit and its programs regularly assess their effectiveness in relation to the course of study offered, fieldwork and clinical practice, and support services for candidates. Both the unit and its programs regularly and systematically collect, analyze, and use candidate and program completer data as well as data reflecting the effectiveness of unit operations to improve programs and their services. The continuous improvement process includes multiple sources of data including 1) the extent to which candidates are prepared to enter professional practice; and 2) feedback from key stakeholders such as employers and community partners about the quality of the preparation.

The expectations in Standard Four are clear. At the program level, the expectation is that credential programs for elementary, secondary, education specialist, education leadership, etc. will conduct

assessment practices that align logically to these requirements. A secondary alignment of the Common Standards to program standards is the usual next step: Because credential programs have to meet standards of their own, valid and reliable measures of a program standard are often extant and can be aligned to broad, unit-level common standards. For example, an elementary education program may have an indirect measure, such as a survey item, that targets student satisfaction (Graunke, 2015; Heath et al., 2012). The item could align 1:1 with a required program standard, where the program faculty may ask students at the end of their program about how well prepared (e.g., satisfied) the students feel about their ability to differentiate instruction for English learners. Obviously, indirect assessments may be biased and not tell the full story of student mastery (Heath et al., 2012), so results are interpreted within the context of other, more objective program measures. Continuing with the elementary program example, direct assessments of students may occur at the end of their clinical experience to see how university supervisors and cooperating teachers rate the effectiveness differentiating instruction during practice. Results from both the direct and indirect measures can then be aligned to a unit-level learning outcome, such as “providing essential services to language-minority communities,” and linked to an analytic rubric criterion (Mertler, 2001). The rubric criterion and its associated data would align with the associated common standard and the program can have a sense of how candidates perform on standards and learning outcomes, along with how they interpret their own performance. But there are challenges when moving interpretation beyond the program level. Program faculty are usually accustomed to reviewing both survey and performance-assessment data to make data-driven decision about programmatic and curricular changes. For example, weak ratings from either self-reports or objective, clinical measures should certainly inspire program faculty to identify areas in need of change, student support, and/or student experience. There are challenges to overcome when implementing and measuring learning outcomes (Hussey & Smith, 2014). Here, when data are aggregated for analysis at the department or unit level, meaning becomes obscured. School psychology faculty or school nursing faculty might not see how their direct and indirect measure fit together with the data from education leadership or special education. In short, while program-level measures are quite meaningful to faculty, aggregating and interpreting these data at the department level or unit level can mean that much context is lost in translation. Mapping Indirect Measures and Direct Measures to Student Learning Outcomes Curriculum mapping and aligning program-level standards to student learning outcomes is a common activity (Liu, Wrobbel, Blankson, I, 2010). One purpose is to preserve the meaningful context of the alignments and measures as there are aggregated and analyzed at the department and unit levels – beyond the program level. For example, a college of school of education may house many credential programs that share the fact the they are preparing candidates to work in schools, but direct, 1:1 comparisons between and among programs is quite complex. While elementary, secondary, and education specialist programs may share much in common and may even write to the same standards, the same idea may not be true for other credential programs, such as school nurse, school social worker, or educational psychologist. Here, assessing a standard like “differentiating instruction for English learners” will work for the former but not the latter group, and this obstacle is usually overcome through breath of the language used in the criterion. Broad student learning outcomes are often used as proxy to unite programs to a common outcome. For example, a hypothetical School of Education or College of Education may have “providing essential

services to language-minority communities” as a learning outcome common across programs. The table below provides an example of a hypothetical mapping to standards among common and distant programs. Table 1 Hypothetical Alignment of Fictitious Program Standards to an SLO

SLO Elementary Secondary Ed Specialist Nursing Social Work Psychologist Providing essential services to language-minority communities

Differentiating instruction for English learners



Culturally responsive health care for diverse linguistic groups

Culturally responsive services for diverse linguistic groups

Culturally responsive therapy for diverse linguistic groups

After developing a map of standards to a common student learning outcome, curriculum and assessment maps usually follow. Because programs are normally required to align direct and indirect assessments to their self-contained program standards, an aligned assessment map, such as the hypothetical example below, should be intelligible to the reader. Table two provides an example. Table 2 Hypothetical Alignment of Fictitious Program Standards and Fictitious Direct and Indirect Measures

SLO Program Program Standard Direct Measure* Indirect Measure*

SLO 1: Providing essential

services to language-minority

communities

Elementary Differentiating instruction for English learners

Teaching Performance Assessment, Rubric

Element 10

Student Exit Survey, Question 13

Secondary Differentiating instruction for English learners


Element 10


Ed Specialist



Element 10


Nursing Culturally responsive health care for diverse linguistic groups

Clinical Assessment, Rubric Element 12

Program Exit Survey, Question 22

Social Work

Culturally responsive services for diverse linguistic groups

Clinical Assessment, Rubric Element 13

Clinical Exit Survey, Question 23

Psychologist Culturally responsive therapy for diverse linguistic groups

Field Experience Assessment, Rubric

Element 14

Field Experience Survey, Question 24

Note: The reader can consider these measures to be summative evaluations for this purpose. Data collected against the map in Table two might include the program, year collected, department, average of the direct measure, average of the indirect measure, n, collection date, and associated SLO. Table three below illustrates how the data might be organized prior to analysis. The hypothetical dataset in Table three includes “data” that have been gathered over a three year period at the end of each semester for elementary, secondary, and specialist programs in a Teaching Credentials department; school nurse and social work data collected from students in a Health and Human Services department; and School Psychology data collected from an Education Leadership department. The hypothetical direct and indirect measures reflect direct assessments of student learning and indirect student self-assessments (e.g., survey responses) as presented in Hypothetical Alignment.

Table 3 Fictitious Program Assessment Data

In Table 3 above, the column headings present what one would expect programs to collect. To facilitate the example to be implemented in this article, several assumptions have been made. First, the average n, obtained by adding the number students participating in direct assessment and the indirect assessment divided by two, is presented in the AVERAGE-N column. Because there might be a discrepancy between the number of survey respondents, for example, and the number students assessed directly, averaging the n will provide the weight that can be attributed to a data-point once plotted. The second assumption is that direct and indirect measures have been reported or converted to a four-point scale. While the scale used is arbitrary, a common conversion is necessary to plot the data points. The final assumption is that plotting will occur with the direct measure on the x-axis and the indirect measure on the y-axis. Logically, one would assume student perceptions of their abilities to follow their experience after the direct measure; as such, it is likely one variable depends upon another variable. The method of analysis that will be offered in this article is one that plots historical time series measures of these direct and indirect assessments of a single, hypothetical student learning outcomes utilizing fictitious data drawn multiple credential programs. The open source statistical tool, R (R Core Team, 2015), and the R-package, “googlevis,” (Gessman & de Castillo, 2011) will be used for this purpose. A description of the benefits of implementing this procedure follows the presentation of the analysis method and resulting animations. Plotting and Animating Direct and Indirect Measures of Student Learning Outcomes from a Time Series The free statistical package, R, can be downloaded and installed from the link in the reference section for the R Core Team (2015). Specific instructions for Windows, Mac OS, and Linux are available there. The R-version used in this article is 3.2.2 (2015-08-14). The googleVis package provides researchers with an easy way to create motion charts from x-y variable data in a time series, along with the relative weight of the n associated with the data points. A wealth of information about the package and its myriad of settings can be found From Gesmann and de Castillo (2014). Installing and loading the googleVis package is a simple process of typing the following code at the command prompt in R (note: the Berkeley CRAN server was selected as the download provider for the package). Examples

Tables 4 – 8 illustrate examples of how data can be parsed and analyzed in R and googleVis.

Table 4 Time Series Bubble Plot of Direct Measures and Indirect Measures with Trails for Elementary, Social Work, and (School) Psychologist

Table 5 Other Available Plots of Direct Measures and Indirect Measures for Elementary, Social Work, and (School) Psychologist

Table 6: The Initial Position of Data

Table 6: The Secondary Position of Data Midway through the Animation

Table 6: The Final Position of Data at the End of the Animation

Discussion As seen in Table four, plotting and animating direct and indirect measures of student learning outcomes from a time series helps faculty, administrators, and researchers overcome a number of challenges associated with traditional methods of analyses. First, this method avoids the need to equate indirect survey items or analytic rubric criteria on direct measures across individual programs to measure a student learning outcome. Because each program’s identified direct and indirect measures are plotted on a single chart, it is possible to visualize and locate each program’s tendency toward a given SLO. In this example, both indirect and direct measures used a fictitious 4-point rating scale: if the coordinates of each program are (4,4), then it can be said that both students and outside assessor are rating each program highly on this particular SLO given the ratings on the aligned assessments. Deviations from (4,4) for any program would be recognizable from the plot points. The second advantage that this method provides to the community is the ability to visualize the behavior of the points plotted from direct and indirect measures over time. With the trails enabled on the programs selected, it is easy to watch as the points rise and fall over time. For example, it is very clear and easy to see as student self-ratings either increase or decrease over time, along with how any changes in the ratings from outside assessors affect the observation. It is possible to observe a program’s direct and indirect ratings rise and fall over time and then identify decisions that may or may not have had an effect on student performance and self-reporting. Similarly, one can also observe programs that demonstrate growth in both direct and indirect assessment. Another advantage is that n’s associated with both the direct and indirect measures become immediately meaningful and visual. By simply clicking on a programs data point, one can see the average n associated with the survey size and the direct assessment. The size of the data point provides clarity; moreover, the data point changes in size if the average changes. As such, the richness of the dataset is brought to life as data points rise and fall with measures of direct and indirect ratings of performance – they also expand and contract as the sample size either increases or decreases. Note: It is possible to show the data table beneath the chart if that is desirable to see the actual n’s and not just the average n. A final advantage of this approach to managing student learning outcome measures is that meaning is maintained at all levels of analysis, from program to department to the unit level. Because data are aggregated by plotting the performance on a single chart, it is simple to observe how clusters of programs perform against an SLO over time because they are color coded by group. In this example, Teaching Credentials, Health and Human Services, Education Leadership are all represented by a corresponding color and an associated trial. One can see plainly where Teaching Credential programs fall on the plot against either Health and Human Services or Education Leadership (or all points together). A until level view of performance over time is easiest of all – it the view of all available points on the chart and their location. Like a loom, this process of analysis sets all accreditation data in motion at once. It allows all members who are involved in data-driven decision making to see the story of their direct and indirect measures woven together into a clear picture of trends and numerical weights. Rather than looking at tables of

means or two-dimensional charts, the data come to life and a truer picture of how students are performing on aligned student learning outcomes can emerge.

References California Commission on Teacher Credentialing. (2015a). Common Standards. Sacramento, CA.

Retrieved December 1, 2015 from http://www.ctc.ca.gov/educator-prep/standards/CommonStandardsTeacherPrep-2015.pdf

Gesmann, M., de Castillo, D. (2011). Using the google visualisation API with R. The R Journal, 3(2), 40- 44. Graunke, S.S. (2015). Deep approaches and learning outcomes: An exploration using indirect measures.

Journal of the Indiana University Student Personnel Association. Retrieved from: http://www.indiana.edu/~iuspa/journal/editions/2015/5%20Deep%20Approaches.pdf

Heath, L., DeHoek, A., & Locatelli, S. (2012). Indirect measures in evaluation: On not knowing what we

don’t know. Practical Assessment, Research & Evaluation, 17(6). Retrieved from: http://pareonline.net/getvn.asp?v=17&n=6

Hussey T., Smith P. (2002) The trouble with learning outcomes. Active Learning in Higher Education 3,

220–233. Lin, M., Wrobbel, D., & Blankson, I. (2010). Rethinking program assessment through the use of program

alignment mapping technique. Communication Teacher, 24(4), 238-246. Mertler, Craig A. (2001). Designing scoring rubrics for your classroom. Practical Assessment, Research

& Evaluation, 7(25). Retrieved December 1, 2015 from http://PAREonline.net/getvn.asp?v=7&n=25

National Institute for Learning Outcomes Assessment. (2014). Knowing what students know and can do:

The current state of student learning outcomes in U.S. colleges and universities. Champaign, IL. R Core Team (2015). R: A language and environment for statistical computing. R Foundation for

Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0.

looming, plotting, and animating historical time series ... · assessment practices that align...

Documents