teacher evaluation models: a national perspective

36
Teacher Evaluation Models: A National Perspective Laura Goe, Ph.D. Research Scientist, ETS Principal Investigator for Research and Dissemination, The National Comprehensive Center for Teacher Quality Connecting Research to Practice: Teacher Effectiveness and Evaluation REL Midwest and Minnesota Service Cooperatives September 21, 2011 St. Cloud, MN

Upload: vivek

Post on 08-Feb-2016

35 views

Category:

Documents


0 download

DESCRIPTION

Teacher Evaluation Models: A National Perspective. Laura Goe, Ph.D . Research Scientist, ETS Principal Investigator for Research and Dissemination, The National Comprehensive Center for Teacher Quality. Connecting Research to Practice: Teacher Effectiveness and Evaluation - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Teacher Evaluation Models:  A National Perspective

Teacher Evaluation Models: A National Perspective

Laura Goe, Ph.D.Research Scientist, ETS

Principal Investigator for Research and Dissemination, The National Comprehensive Center for Teacher Quality

Connecting Research to Practice: Teacher Effectiveness and Evaluation

REL Midwest and Minnesota Service Cooperatives September 21, 2011 St. Cloud, MN

Page 2: Teacher Evaluation Models:  A National Perspective

2

The goal of teacher evaluation

The ultimate goal of all teacher evaluation should be…

TO IMPROVE TEACHING AND

LEARNING

Page 3: Teacher Evaluation Models:  A National Perspective

3

Today’s presentation available online

• To download a copy of this presentation or look at on your internet-enabled device (iPad, smart phone, computer, etc.), go to www.lauragoe.com Publications and Presentations page. Today’s presentation is at the bottom of the

page Also, see the handout “Questions to ask

about measures and models” (middle of page)

Page 4: Teacher Evaluation Models:  A National Perspective

4

Trends in teacher evaluation• Policy is way ahead of the research in teacher

evaluation measures and models Though we don’t yet know which model and combination of

measures will identify effective teachers, many states and districts are compelled to move forward at a rapid pace

• Inclusion of student achievement growth data represents a huge “culture shift” in evaluation

Communication and teacher/administrator participation and buy-in are crucial to ensure change

• The implementation challenges are enormous Few models exist for states and districts to adopt or adapt Many districts have limited capacity to implement comprehensive

systems, and states have limited resources to help them

Page 5: Teacher Evaluation Models:  A National Perspective

5

How did we get here?

• Value-added research shows that teachers vary greatly in their contributions to student achievement (Rivkin, Hanushek, & Kain, 2005).

• The Widget Effect report (Weisberg et al., 2009) “…examines our pervasive and longstanding failure to recognize and respond to variations in the effectiveness of our teachers.” (from Executive Summary)

Page 6: Teacher Evaluation Models:  A National Perspective

6

Measures and models: Definitions

• Measures are the instruments, assessments, protocols, rubrics, and tools that are used in determining teacher effectiveness

• Models are the state or district systems of teacher evaluation including all of the inputs and decision points (measures, instruments, processes, training, and scoring, etc.) that result in determinations about individual teachers’ effectiveness

Page 7: Teacher Evaluation Models:  A National Perspective

7

Multiple measures of teacher effectiveness

• Evidence of growth in student learning and competency Standardized tests, pre/post tests in untested subjects Student performance (art, music, etc.) Curriculum-based tests given in a standardized manner Classroom-based tests such as DIBELS

• Evidence of instructional quality Classroom observations Lesson plans, assignments, and student work Student surveys such as Harvard’s Tripod Evidence binder (next generation of portfolio)

• Evidence of professional responsibility Administrator/supervisor reports, parent surveys Teacher reflection and self-reports, records of contributions

Page 8: Teacher Evaluation Models:  A National Perspective

8

Measures that help teachers grow

• Measures that motivate teachers to examine their own practice against specific teaching standards

• Measures that allow teachers to participate in or co-construct the evaluation (such as “evidence binders”)

• Measures that give teachers opportunities to discuss the results with evaluators, administrators, colleagues, teacher learning communities, mentors, coaches, etc.

• Measures that are aligned with professional development offerings

• Measures which include protocols and processes that teachers can examine and comprehend

• Measures that provide information teachers can use to make immediate adjustments in instruction

Page 9: Teacher Evaluation Models:  A National Perspective

9

Validity is a process

• Herman et al. (2011) state, “Validity is a matter of degree (based on the extent to which an evidence-based argument justifies the use of an assessment for a specific purpose).” (pg. 1)

• Starts with defining the criteria and standards you want to measure, then choosing measures

• Requires judgment about whether the instruments and processes are giving accurate, helpful information about performance

• Verify validity by Comparing results on multiple measures Multiple time points, multiple raters

Page 10: Teacher Evaluation Models:  A National Perspective

10

Validity of classroom observations is highly dependent on training

• Even with a terrific observation instrument, the results are meaningless if observers are not trained to agree on evidence and scoring

• A teacher should get the same score no matter who observes him

This requires that all observers be trained on the instruments and processes

Occasional “calibrating” should be done; more often if there are discrepancies or new observers

Who the evaluators are matters less than that they are adequate trained and calibrated

Teachers should also be trained on the observation forms and processes to improve validity of results

Page 11: Teacher Evaluation Models:  A National Perspective

11

Value-added and Colorado Growth Model

• EVAAS uses prior test scores to predict the next score for a student

• Teachers’ value-added is the difference between actual and predicted scores for a set of students

• Colorado Growth model Betebenner 2008: Focus on “growth to proficiency” Measures students against “academic peers”

• Ongoing concerns about validity of using growth models for teacher evaluation

Researchers have raised numerous cautions (see my July 28, 2011 Texas and Southeast Comp Center presentation for recent studies and findings)

Page 12: Teacher Evaluation Models:  A National Perspective

12

Evidence of teachers’ contribution to student learning growth

• Value-added can provide useful evidence of teacher’s contribution to student growth

• “It is not a perfect system of measurement, but it can complement observational measures, parent feedback, and personal reflections on teaching far better than any available alternative.” Glazerman et al. (2010) pg 4

Page 13: Teacher Evaluation Models:  A National Perspective

13

Technical challenges (value-added)

• Teacher effectiveness scores for an individual teacher can vary considerably across statistical models. (Newton et al., 2010)

• Student characteristics may affect teacher rankings even when using statistical controls. An individual teacher’s ranking may vary between courses/years depending on student characteristics, with lower effectiveness ranks when teaching less advantaged (Newton et al., 2010)

Page 14: Teacher Evaluation Models:  A National Perspective

14

Technical challenges (cont’d)

• Error rates (in teacher ranking) can be about 25% with three years of student data, and 35% with one year of data. (Schochet & Chiang, 2010)

• Teachers’ scores on subscales of a test can yield very different results, which also raises the question of weighting subscale results. (Lockwood et al, 2007)

• In one study, 21% of teachers in Washington, DC had students who had also been in another math teacher’s class that year (Hock & Isenberg, 2011)

Page 15: Teacher Evaluation Models:  A National Perspective

15

Responses to technical challenges

• Use multiple years of data to mitigate sorting bias and gain stability in estimates (Koedel & Betts, 2009; McCaffrey et al., 2009; Glazerman et al., 2010 )

• Use confidence intervals and other sources of information to improve reliability and validity of teacher effectiveness ratings (Glazerman et al., 2010)

• Have teachers and administrators verify rosters to ensure scores are calculated with students the teachers actually taught

• Consider the importance of subscores in teacher rankings

Page 16: Teacher Evaluation Models:  A National Perspective

16

What nearly all state and district models have in common

• Value-added or Colorado Growth Model will be used for those teachers in tested grades and subjects (4-8 ELA & Math in most states)

• States want to increase the number of tested subjects and grades so that more teachers can be evaluated with growth models

• States are generally at a loss when it comes to measuring teachers’ contribution to student growth in non-tested subjects and grades

Page 17: Teacher Evaluation Models:  A National Perspective

17

Measuring teachers’ contributions to student learning growth: A summary of current models that include

non-tested subjects and gradesModel Description

Student learning objectives

Teachers assess students at beginning of year and set objectives then assesses again at end of year; principal or designee works with teacher, determines success

Subject & grade alike team models(“Ask a Teacher”)

Teachers meet in grade-specific and/or subject-specific teams to consider and agree on appropriate measures that they will all use to determine their individual contributions to student learning growth

Pre-and post-tests model

Identify or create pre- and post-tests for every grade and subject

School-wide value-added

Teachers in tested subjects & grades receive their own value-added score; all other teachers get the school-wide average

Page 18: Teacher Evaluation Models:  A National Perspective

18

Recommendation from NBPTS Task Force on measuring teachers’ contribution to student learning

growth

Recommendation 2: Employ measures of student learning explicitly aligned with the elements of curriculum for which the teachers are responsible. This recommendation emphasizes the importance of ensuring that teachers are evaluated for what they are teaching. [emphasis added] (Linn et al., 2011)

Page 19: Teacher Evaluation Models:  A National Perspective

19

Comparison of classroom-based student growth measures

Student Growth Measures

“Ask A Teacher” – Collaborative Student Learning Objectives - IndividualizedStandards basedClassroom level

Have pre- and post- measuresRigor can be verified

Comparable across same subject/grade teachers within a district because teachers discuss and agree to a set of measures to be used

Not comparable because teachers don’t choose the same measures to assess their students’ learning growth

Teachers work together collaboratively to assess their students’ growth

Teachers work on their own to assess their students’ growth (though a “team” SLO or working together may be possible)

District approves measures Principal approves measures (maybe district, depending on state guidance)

Teachers collaborate on scoring for written work and 4Ps (projects, portfolios, products, and performances)

Individual teachers work with principals to determine student learning growth

Page 20: Teacher Evaluation Models:  A National Perspective

20

Comparability of measures

• It is not appropriate to use the same measure for every grade and subject A measure that may be valid for one

subject/grade may not be valid for another• Measures should be chosen because they are

appropriate for a specific subject and grade, not because they fit a certain format A paper-and-pencil test may be appropriate for

some subjects, while performance tests to measure applied knowledge and skills may be appropriate for others

Page 21: Teacher Evaluation Models:  A National Perspective

21

Measuring teachers’ contributions to student learning growth (classroom)

Page 22: Teacher Evaluation Models:  A National Perspective

22

Same measures for same subjects/grades

• As much as possible, use the same measure for all teachers in a district in a particular subject/grade This helps prevent score differences based on

using a variety of measures Score differences should be based on the

teachers’ contribution to student learning growth, not differences in the assessments they’re using

Page 23: Teacher Evaluation Models:  A National Perspective

23

Using the measures in comparable ways

• Even if all teachers are using the same measures in a grade/subject, they may be using them in different ways

Giving the assessment at different times of the year Allowing students more time to complete the assessment Engaging in test prep or coaching students in completing

assessments• To ensure that differences in student scores are

based on teacher performance, not on how/when the assessment was given, “standardize” assessment processes as much as possible

Page 24: Teacher Evaluation Models:  A National Perspective

24

Model highlight: Multiple measures of student learning

Using multiple measures of student learning as

evidence of ALL teachers’ contributions

to student learning growth

Page 25: Teacher Evaluation Models:  A National Perspective

25

Rhode Island DOE Model: Framework for Applying Multiple Measures of Student Learning

Category 1: Student growth

on state standardized tests (e.g., NECAP, PARCC)

Student learning rating

Professional practice rating

Professional responsibilities

rating

+

+

Final evaluation

rating

Category 2: Student growth on standardized

district-wide tests (e.g., NWEA, AP

exams, Stanford-10,

ACCESS, etc.)

Category 3: Other local

school-, administrator-,

or teacher-selected

measures of student

performance

The student learning rating is determined by a combination of different sources of evidence of student learning. These sources fall into three categories:

Page 26: Teacher Evaluation Models:  A National Perspective

26

Model highlight: Triangulating results for validity

One way New Haven, CT verifies validity of results is through placing

scores on a matrix to look for mismatches that may indicate problems (with instruments,

training, scoring, etc.) or may point to a the need for additional support

Page 27: Teacher Evaluation Models:  A National Perspective

27

New Haven “matrix”

Asterisks indicate a mismatch—teacher is very high on one area (practice or growth) and very low on the other area.

Page 28: Teacher Evaluation Models:  A National Perspective

28

When measures fail to indicate which teachers are effective

• Tendency is to “blame the measure”• Rather than stating, “It did not work,”

consider asking “What did not work?” Insufficient training on scoring, evidence,

processes, etc. Implementation problems Lack of understanding of processes on part of

teachers, facilitators, evaluators, administrators, etc.

Page 29: Teacher Evaluation Models:  A National Perspective

29

Considerations

• Consider whether human resources and capacity are sufficient to ensure fidelity of implementation

Poor implementation threatens validity of results• Establish a plan to evaluate measures to determine if they

can effectively differentiate among teacher performance Need to identify potential “widget effects” in measures If measure is not differentiating among teachers, may be faulty

training or poor implementation, not the measure itself Examine correlations among results from different measures

• Evaluate processes and data each year and make needed adjustments

• Publish findings of evaluations of both overall system and specific measure

Page 30: Teacher Evaluation Models:  A National Perspective

30

Final thoughts

• The limitations: There are no perfect measures There are no perfect models Changing the culture of evaluation is hard work

• The opportunities: Evidence can be used to trigger support for struggling

teachers and acknowledge effective ones Multiple sources of evidence can provide powerful

information to improve teaching and learning Evidence is more valid than “judgment” and provides

better information for teachers to improve practice

Page 31: Teacher Evaluation Models:  A National Perspective

31

Evaluation System Models that include student learning growth as a measure of teacher

effectiveness

Austin (Student learning objectives with pay-for-performance, group and individual SLOs assess with comprehensive rubric)

http://archive.austinisd.org/inside/initiatives/compensation/slos.phtml Georgia CLASS Keys (Comprehensive rubric, includes student achievement—

see last few pages)System: http://www.gadoe.org/tss_teacher.aspx Rubric: http://www.gadoe.org/DMGetDocument.aspx/CK%20Standards%2010-18-2010.pdf?p=6CC6799F8C1371F6B59CF81E4ECD54E63F615CF1D9441A92E28BFA2A0AB27E3E&Type=D

Hillsborough, Florida (Creating assessments/tests for all subjects)http://communication.sdhc.k12.fl.us/empoweringteachers/

Page 32: Teacher Evaluation Models:  A National Perspective

32

Evaluation System Models that include student learning growth as a measure of teacher

effectiveness (cont’d)

New Haven, CT (SLO model with strong teacher development component and matrix scoring; see Teacher Evaluation & Development System)

http://www.nhps.net/scc/index Rhode Island DOE Model (Student learning objectives combined with teacher

observations and professionalism)http://www.ride.ri.gov/assessment/DOCS/Asst.Sups_CurriculumDir.Network/Assnt_Sup_August_24_rev.pptTeacher Advancement Program (TAP) (Value-added for tested grades only,

no info on other subjects/grades, multiple observations for all teachers)http://www.tapsystem.org/Washington DC IMPACT Guidebooks (Variation in how groups of teachers are

measured—50% standardized tests for some groups, 10% other assessments for non-tested subjects and grades)

http://www.dc.gov/DCPS/In+the+Classroom/Ensuring+Teacher+Success/IMPACT+(Performance+Assessment)/IMPACT+Guidebooks

Page 33: Teacher Evaluation Models:  A National Perspective

33

References

Betebenner, D. W. (2008). A primer on student growth percentiles. Dover, NH: National Center for the Improvement of Educational Assessment (NCIEA).

http://www.cde.state.co.us/cdedocs/Research/PDF/Aprimeronstudentgrowthpercentiles.pdf Glazerman, S., Goldhaber, D., Loeb, S., Raudenbush, S., Staiger, D. O., & Whitehurst, G. J. (2011). Passing muster:

Evaluating evaluation systems. Washington, DC: Brown Center on Education Policy at Brookings. http://www.brookings.edu/reports/2011/0426_evaluating_teachers.aspx# Herman, J. L., Heritage, M., & Goldschmidt, P. (2011). Developing and selecting measures of student

growth for use in teacher evaluation. Los Angeles, CA: University of California, National Center for Research on Evaluation, Standards, and Student Testing (CRESST).

http://www.aacompcenter.org/cs/aacc/view/rs/26719 Hock, H., & Isenberg, E. (2011). Methods for accounting for co-teaching in value-added models. Princeton, NJ:

Mathematica Policy Research.http://www.aefpweb.org/sites/default/files/webform/Hock-Isenberg%20Co-Teaching%20in%20VAMs.pdf Koedel, C., & Betts, J. R. (2009). Does student sorting invalidate value-added models of teacher

effectiveness? An extended analysis of the Rothstein critique. Cambridge, MA: National Bureau of Economic Research.

http://economics.missouri.edu/working-papers/2009/WP0902_koedel.pdf Linn, R., Bond, L., Darling-Hammond, L., Harris, D., Hess, F., & Shulman, L. (2011). Student learning, student

achievement: How do teachers measure up? Arlington, VA: National Board for Professional Teaching Standards.http://www.nbpts.org/index.cfm?t=downloader.cfm&id=1305 Lockwood, J. R., McCaffrey, D. F., Hamilton, L. S., Stecher, B. M., Le, V.-N., & Martinez, J. F. (2007). The sensitivity of

value-added teacher effect estimates to different mathematics achievement measures. Journal of Educational Measurement, 44(1), 47-67.

http://www.rand.org/pubs/reprints/RP1269.html

Page 34: Teacher Evaluation Models:  A National Perspective

34

References (continued)McCaffrey, D., Sass, T. R., Lockwood, J. R., & Mihaly, K. (2009). The intertemporal stability of teacher

effect estimates. Education Finance and Policy, 4(4), 572-606.http://www.mitpressjournals.org/doi/abs/10.1162/edfp.2009.4.4.572 Newton, X. A., Darling-Hammond, L., Haertel, E., & Thomas, E. (2010). Value-added modeling of teacher effectiveness:

An exploration of stability across models and contexts. Education Policy Analysis Archives, 18(23).http://epaa.asu.edu/ojs/article/view/810Rivkin, S. G., Hanushek, E. A., & Kain, J. F. (2005). Teachers, schools, and academic achievement. Econometrica,

73(2), 417 - 458. http://www.econ.ucsb.edu/~jon/Econ230C/HanushekRivkin.pdf Sanders, W. L., & Horn, S. P. (1998). Research findings from the Tennessee Value-Added Assessment System

(TVAAS) Database: Implications for educational evaluation and research. Journal of Personnel Evaluation in Education, 12(3), 247-256.

http://www.sas.com/govedu/edu/ed_eval.pdf Schochet, P. Z., & Chiang, H. S. (2010). Error rates in measuring teacher and school performance based on student

test score gains. Washington, DC: National Center for Education Evaluation and Regional Assistance, Institute of Education Sciences, U.S. Department of Education.

http://ies.ed.gov/ncee/pubs/20104004/pdf/20104004.pdf Weisberg, D., Sexton, S., Mulhern, J., & Keeling, D. (2009). The widget effect: Our national failure to acknowledge and

act on differences in teacher effectiveness. Brooklyn, NY: The New Teacher Project.http://widgeteffect.org/downloads/TheWidgetEffect.pdf

Page 35: Teacher Evaluation Models:  A National Perspective

35

Questions?

Page 36: Teacher Evaluation Models:  A National Perspective

Laura Goe, Ph.D.609-734-1076 [email protected] Comprehensive Center for Teacher Quality1100 17th Street NW, Suite 500Washington, DC 20036-4632877-322-8700 > www.tqsource.org