Voices: Can evaluations do more than sort teachers?

Colorado education researcher Robert Reichardt says newly-available evidence from two economists suggests the act of evaluating teachers can result in improved performance. 

One of the key assumptions behind Senate Bill 10-191, the educator effectiveness policy, is that teacher evaluations can lead to improved teacher effectiveness. The research base on this assumption is not particularly deep. One of the strongest studies on this issue comes from a pair of economists, Eric Taylor and John Tyler. They looked at the relationship between being evaluated and subsequent teacher value-added using data from a long-running evaluation system in Cincinnati. Until recently, their research was hidden behind a pay-wall at the National Bureau of Economic Research, or NBER. However, a new summary of their work is available at EducationNext.

Their key finding is that being evaluated was associated with improved teacher effectiveness. An average student’s math scores were 4.5 percentile points higher in the year after a teacher was evaluated than the year before the evaluation. The results suggest that the process of providing teachers with feedback can lead to improved teacher effectiveness. In other words, teacher evaluation systems can have benefits beyond identifying highly-rated teachers for rewards and exiting those with low ratings.

Teacher effectiveness continued to improve in the years following the evaluation – contrary to the notion that effectiveness improvements plateau after the first few years of teaching. Teachers with the lowest evaluation scores tended to improve the most – suggesting at least some evaluations had a motivational effect.

These improvements occurred despite the fact that 90 percent of teachers were ultimately rated either “proficient” or “distinguished” (the other lower performance levels were “basic” and “unsatisfactory”). This is important because many researchers and policymakers have argued an effective evaluation system needs to have a wider distribution (i.e. a larger proportion of teachers should be rated lower).  Tyler and Taylor do say that there was more variation on sub-scales within the observation rubric and between observations (the final rating was a product of four observations). The study also does not mention using the evaluation ratings to target teacher professional development.

This study does have limits

There are several important limitations to the study. First, it focused only on mid-career teachers in grades four through eight who, on average, were evaluated once every five years. Second, no relationship was found between evaluations and subsequent teacher effectiveness in reading.

Equally important, it still leaves many unanswered questions for those designing Colorado’s new evaluation systems. In particular:

  • What aspects of the Cincinnati system were central to its success? Was it the four observations in one year, or the four performance levels?
  • Was it the use of the trained peer observers for three of those observations?
  • Was it that three of the four observations were unannounced?
  • Was it basing the evaluation rubrics on Charlotte Danielson’s Framework (compared to Colorado’s use of the North Carolina framework)?
  • Was it the risk of being placed on an improvement plan for low-performing teachers, or the promotion opportunities that opened for some teachers who received higher ratings?

Finally, Cincinnati’s system was based entirely on observations. What lessons does this hold for Colorado’s system, where 50 percent of the final rating will be based upon student growth?

One thing the study does make clear is that the evaluation system is not cheap. It cost approximately $7,500 per teacher evaluated, with most of the money spent on peer evaluators. This is a similar amount to the estimated yearly spending per teacher on professional development.

As a state, we are investing a lot of energy to implement a new teacher evaluation system. One study does not make a strong research base, nor does it provide answers to the myriad of design questions faced by system developers. However, this study does suggest this investment will have a payoff.

About our First Person series:

First Person is where Chalkbeat features personal essays by educators, students, parents, and others trying to improve public education. Read our submission guidelines here.