Future of Teaching

In Jeffco, a school where teacher evaluations are a team effort

When North Arvada Middle School started an overhaul of its teacher evaluations three years ago, Barbara Aswege could not have been more opposed.

“I was the dragon lady,” said Aswege, who teaches social studies. She objected to observations, ignored feedback and fought the school administration every step of the way.

But as the end of the year approached, she noticed something: her teaching hadn’t improved, at all.

“I wasn’t getting anywhere,” she said. So she started to reconsider her position, asking for books to read over the summer.

Aswege’s attitude adjustment is one that state and district official hope to see replicated across the state with the rollout of Senate Bill 191, which governs how teachers and other school staff are evaluated. That law, which went into full effect this year, mandates more frequent classroom observations intended to assess teachers’ practices on an extensive list of standards and the inclusion of student test scores in year-end evaluations. While proponents say the law is intended to help teachers improve, many districts have struggled to provide teachers with the additional help and training needed to get better.

But at North Arvada, teachers get evaluated frequently, but they also receive lots of support, including regular meetings with a trainer and a team of teachers who help each other with curriculum and instruction.

The primary goal, says North Arvada’s principal Dana Ellis, is to help teachers get better if they can.

“If you don’t have a structure and system built in a school, teachers don’t have much of a chance,” said Ellis.

As part of a Jefferson County School District pilot, teams of teachers at North Arvada set goals for student learning that they feel are reasonable, take more time to plan their teaching and receive far more support and feedback than in a more traditional system. In return, they are expected to deliver on the goals they set or they risk losing out on a year-end bonus of up to $15,000.

While the impact on student learning is still unclear, Aswege and other North Arvada teachers say the overhaul drove a radical shift in the way they teach. On a recent afternoon, two years after Aswege’s vocal protests began to peter out, she welcomed three observers into her room to give notes on a lesson plan that required students to give each other feedback on their writing.

“I’m constantly begging them to come in my room,” Aswege said. Ellis and a team of trained former teachers observe individual teachers at the school as many as 20 times in a semester.

After teachers get observed, they follow up with conversations with coaches and master teachers on how to respond to the criticism they receive. In Aswege’s case, that meant coming up with a way to teach students how to give feedback by modeling it herself and videotaping particularly successful conversations between students.

Barbara Aswege works through the feedback she received with her master teacher, Shareen Connors.
Barbara Aswege works through the feedback she received with her master teacher, Shareen Connors.

In addition to changing the way individual teachers teach, the program has also driven a change in the way teachers and administrators spend their time. Ellis and her team have overhauled the daily workings of the school to create more time for teachers to work together, to plan ahead for instruction and reflect on their teaching. And administrators are asked place spending time in classrooms and supporting teachers at the very top of their priorities.

On a recent Thursday afternoon, Ellis rearranged the school day so groups of teachers could plan for next school year. A group of language arts teachers were in the midst of mapping out exactly what students should learn next year — and how to measure whether their students learned it. Across the building, the math team reflected on the successes and failings of that school year.

Ellis said that planning time is all in service of the law’s first objective: helping teachers become better at instruction. In order to improve, teachers need an environment where they can try new things and see if they succeed or fail — and bouncing ideas of their peers is big part of that.

In spite of the school staff’s enthusiasm for the changes at North Arvada, they aren’t likely to go statewide anytime soon. For one, the pilot was funded through a federal grant that runs out next year and schools like North Arvada aren’t yet sure how they will continue their work after it does.

For another, the district’s preliminary findings indicate that the program’s success still depends on the person in the principal’s chair. At schools where leadership was weak, a preliminary report found teachers were less likely to seek out ways to improve and to report that the team collaborations were useful.

And the end goal — improvements in student performance — has not yet been achieved. While district officials caution it may still be too early tell, the district has not seen a significant impact on student achievement.

Still, it has given principals like Ellis a way to quickly assess whether a teacher is up for the challenge and make informed decisions about hiring and firing — also a key objective of the new evaluation system’s architects.

That has proved the more contested half of the law. In neighboring Denver Public Schools, teachers say elements of the law have been used to punish those who speak out to administrators and push out more experienced teachers. The Denver teachers’ union filed suit to have that provision eliminated from the law.

So far, that controversy doesn’t exist in Jeffco. But teachers have felt the effect of the system in other ways, especially those who did not receive their anticipated bonus.

“It’s been a mental shift for some schools,” said Ashley Kelley, one of the pilot’s trained observers. “If you’re not making growth, you’re not going to get that payout.”

But Ellis said that the new system helps her lay out very clear about the expectations for her teachers from the day they walk in the door.

Ellis said she puts teachers entering the school or switching classrooms on what she calls a “steep learning curve,” with intensive supports along the way. She expects to see results within a matter of months. For example, a teacher whose students had made only incremental progress on their learning goals for months on end was dismissed mid-year. The teacher replacement? She saw a ten percent jump in just three months.

“If you have a marginal or better teacher, they can handle it,” Ellis said. “Marginal or worse, they can’t.”

fight another day

In union defeat, lawmakers end session without revamping teacher evaluation law

After a hard-fought battle by the state teachers union, New York lawmakers went home for the summer without overhauling a controversial teacher evaluation law that ties state test scores to educator ratings.

The bill pushed by the unions would have left decisions about whether to use state test scores in teacher evaluations up to local union negotiations. While the bill cleared the Assembly, it was bottled up by the Senate’s leadership, which demanded charter school concessions in return that Assembly Democrats wouldn’t agree to.

The effort to decouple test scores from teacher evaluations was one of several that fizzled out at the end of a lackluster session characterized by lawmaker gridlock.

“Sen. Flanagan, his caucus and five Democrats chose to betray the state’s teachers,”  said New York State United Teachers President Andy Pallotta in a statement. “Make no mistake, New York teachers, parents and public school students will remember which senators voted against their public schools when we head to the polls this September and again in November.”

There is some possibility that lawmakers could return to finish a few unresolved issues this summer, but Pallotta told Chalkbeat he is not holding out hope for that outcome.

The lack of action is a defeat for the state teachers union, which fought hard for the bill since the beginning of the session. Union officials have staged musical rallies, bought balloons, rented a truck with a message urging lawmakers to pass the bill, and capped off the last day of session handing out ice cream for the cause.

However, the legislative loss gives the union something to rally around during this fall’s elections. Also, other education advocacy organizations are content to engage in a longer process to revamp evaluations.

“Inaction isn’t always the worst outcome,” said Julie Marlette, Director of Governmental Relations for the New York State School Boards Association.“Now we can continue to work with both legislative and regulatory figures to hopefully craft an update to evaluations that is thoughtful and comprehensive and includes all the stakeholders.”  

The news also means that New York’s teacher evaluation saga which has been raging for eight years will spill over into at least next year. Policymakers have been battling about state teacher evaluations since 2010, when New York adopted a system that started using state test scores to rate teachers in order to win federal “Race to the Top” money.

Teacher evaluations were altered again in 2015 when Gov. Andrew Cuomo called for a more stringent evaluation system, saying evaluations as they existed were “baloney.” The new system was met with resistance from the teachers unions and parents across the state. Nearly one in five families boycotted state tests in response to evaluation changes and a handful of other education policies.

The state’s Board of Regents acted quickly, passing a moratorium on the use of grades three to eight math and English tests in teacher evaluations. But the original 2015 law remains on the books. It was a central plank in that law which could require as much as half of an educator’s evaluation to be based on test scores that the unions targeted during this session.

With the moratorium set to expire in 2019, the fight over teacher evaluations will likely become more pressing next year. It may also allow the state education department to play a greater role in shaping the final product. State education department officials had begun to lay out a longer roadmap for redesigning teacher evaluations that involved surveys and workgroups, but the legislative battle threatened to short-circuit their process.

Now officials at the state education department say they will restart their work and pointed out that they could extend the moratorium to provide extra time if needed.

“We will resume the work we started earlier this year to engage teachers, principals and others as we seek input in moving toward developing a new educator evaluation system,” said state education department spokeswoman Emily DeSantis.

For some education advocates, slowing down the process sounds like a good idea.

“Our reaction on the NYSUT Assembly teacher evaluation bill is that you could do worse but that you could also do better and that we should take time to try,” said Bob Lowry, deputy director of the New York State Council of School Superintendents.

What seems to be a setback for the union now may be a galvanizing force during elections this fall. Republican lawmakers will likely struggle to keep control of the state Senate, and NYSUT is promising to use this inaction against them. That could be particularly consequential in Long Island, which is a hotbed of the testing opt-out movement.

It’s unclear whether the failure to act will also prove problematic for Cuomo, who is also seeking re-election. Cuomo, who pushed for the 2015 law the unions despise, is facing competition from the left in gubernatorial challenger Cynthia Nixon.

But at least so far, it seems like the union is reserving the blame for Senate Republicans and not for the governor.

Cuomo is “making it clear that he has heard the outcry,” said Pallotta. “I blame Senator Flanagan, I blame his conference and I blame 5 [Senate] Democrats.”

a high-stakes evaluation

The Gates Foundation bet big on teacher evaluation. The report it commissioned explains how those efforts fell short.

PHOTO: Brandon Dill/The Commercial Appeal
Sixth-grade teacher James Johnson leads his students in a gameshow-style lesson on energy at Chickasaw Middle School in 2014 in Shelby County. The district was one of three that received a grant from the Gates Foundation to overhaul teacher evaluation.

Barack Obama’s 2012 State of the Union address reflected the heady moment in education. “We know a good teacher can increase the lifetime income of a classroom by over $250,000,” he said. “A great teacher can offer an escape from poverty to the child who dreams beyond his circumstance.”

Bad teachers were the problem; good teachers were the solution. It was a simplified binary, but the idea and the research it drew on had spurred policy changes across the country, including a spate of laws establishing new evaluation systems designed to reward top teachers and help weed out low performers.

Behind that effort was the Bill and Melinda Gates Foundation, which backed research and advocacy that ultimately shaped these changes.

It also funded the efforts themselves, specifically in several large school districts and charter networks open to changing how teachers were hired, trained, evaluated, and paid. Now, new research commissioned by the Gates Foundation finds scant evidence that those changes accomplished what they were meant to: improve teacher quality or boost student learning.  

The 500-plus page report by the Rand Corporation, released Thursday, details the political and technical challenges of putting complex new systems in place and the steep cost — $575 million — of doing so.

The post-mortem will likely serve as validation to the foundation’s critics, who have long complained about Gates’ heavy influence on education policy and what they call its top-down approach.

The report also comes as the foundation has shifted its priorities away from teacher evaluation and toward other issues, including improving curriculum.

“We have taken these lessons to heart, and they are reflected in the work that we’re doing moving forward,” the Gates Foundation’s Allan Golston said in a statement.

The initiative did not lead to clear gains in student learning.

At the three districts and four California-based charter school networks that took part of the Gates initiative — Pittsburgh; Shelby County (Memphis), Tennessee; Hillsborough County, Florida; and the Alliance-College Ready, Aspire, Green Dot, and Partnerships to Uplift Communities networks — results were spotty. The trends over time didn’t look much better than similar schools in the same state.

Several years into the initiative, there was evidence that it was helping high school reading in Pittsburgh and at the charter networks, but hurting elementary and middle school math in Memphis and among the charters. In most cases there were no clear effects, good or bad. There was also no consistent pattern of results over time.

A complicating factor here is that the comparison schools may also have been changing their teacher evaluations, as the study spanned from 2010 to 2015, when many states passed laws putting in place tougher evaluations and weakening tenure.

There were also lots of other changes going on in the districts and states — like the adoption of Common Core standards, changes in state tests, the expansion of school choice — making it hard to isolate cause and effect. Studies in Chicago, Cincinnati, and Washington D.C. have found that evaluation changes had more positive effects.

Matt Kraft, a professor at Brown who has extensively studied teacher evaluation efforts, said the disappointing results in the latest research couldn’t simply be chalked up to a messy rollout.

These “districts were very well poised to have high-quality implementation,” he said. “That speaks to the actual package of reforms being limited in its potential.”

Principals were generally positive about the changes, but teachers had more complicated views.

From Pittsburgh to Tampa, Florida, the vast majority of principals agreed at least somewhat that “in the long run, students will benefit from the teacher-evaluation system.”

Source: RAND Corporation

Teachers in district schools were far less confident.

When the initiative started, a majority of teachers in all three districts tended to agree with the sentiment. But several years later, support had dipped substantially. This may have reflected dissatisfaction with the previous system — the researchers note that “many veteran [Pittsburgh] teachers we interviewed reported that their principals had never observed them” — and growing disillusionment with the new one.

Majorities of teachers in all locations reported that they had received useful feedback from their classroom observations and changed their habits as a result.

At the same time, teachers in the three districts were highly skeptical that the evaluation system was fair — or that it made sense to attach high-stakes consequences to the results.

The initiative didn’t help ensure that poor students of color had more access to effective teachers.

Part of the impetus for evaluation reform was the idea, backed by some research, that black and Hispanic students from low-income families were more likely to have lower-quality teachers.  

But the initiative didn’t seem to make a difference. In Hillsborough County, inequity expanded. (Surprisingly, before the changes began, the study found that low-income kids of color actually had similar or slightly more effective teachers than other students in Pittsburgh, Hillsborough County, and Shelby County.)

Districts put in place modest bonuses to get top teachers to switch schools, but the evaluation system itself may have been a deterrent.

“Central-office staff in [Hillsborough County] reported that teachers were reluctant to transfer to high-need schools despite the cash incentive and extra support because they believed that obtaining a good VAM score would be difficult at a high-need school,” the report says.

Evaluation was costly — both in terms of time and money.

The total direct cost of all aspects of the program, across several years in the three districts and four charter networks, was $575 million.

That amounts to between 1.5 and 6.5 percent of district or network budgets, or a few hundred dollars per student per year. Over a third of that money came from the Gates Foundation.

The study also quantifies the strain of the new evaluations on school leaders’ and teachers’ time as costing upwards of $200 per student, nearly doubling the the price tag in some districts.

Teachers tended to get high marks on the evaluation system.

Before the new evaluation systems were put in place, the vast majority of teachers got high ratings. That hasn’t changed much, according to this study, which is consistent with national research.

In Pittsburgh, in the initial two years, when evaluations had low stakes, a substantial number of teachers got low marks. That drew objections from the union.

“According to central-office staff, the district adjusted the proposed performance ranges (i.e., lowered the ranges so fewer teachers would be at risk of receiving a low rating) at least once during the negotiations to accommodate union concerns,” the report says.

Morgaen Donaldson, a professor at the University of Connecticut, said the initial buy-in followed by pushback isn’t surprising, pointing to her own research in New Haven.

To some, aspects of the initiative “might be worth endorsing at an abstract level,” she said. “But then when the rubber hit the road … people started to resist.”

More effective teachers weren’t more likely to stay teaching, but less effective teachers were more likely to leave.

The basic theory of action of evaluation changes is to get more effective teachers into the classroom and then stay there, while getting less effective ones out or helping them improve.

The Gates research found that the new initiatives didn’t get top teachers to stick around any longer. But there was some evidence that the changes made lower-rated teachers more likely to leave. Less than 1 percent of teachers were formally dismissed from the places where data was available.

After the grants ran out, districts scrapped some of the changes but kept a few others.

One key test of success for any foundation initiative is whether it is politically and financially sustainable after the external funds run out. Here, the results are mixed.

Both Pittsburgh and Hillsborough have ended high-profile aspects of their program: the merit pay system and bringing in peer evaluators, respectively.

But other aspects of the initiative have been maintained, according to the study, including the use of classroom observation rubrics, evaluations that use multiple metrics, and certain career-ladder opportunities.

Donaldson said she was surprised that the peer evaluators didn’t go over well in Hillsborough. Teachers unions have long promoted peer-based evaluation, but district officials said that a few evaluators who were rude or hostile soured many teachers on the concept.

“It just underscores that any reform relies on people — no matter how well it’s structured, no matter how well it’s designed,” she said.

Correction: A previous version of this story stated that about half of the money for the initiative came from the Gates Foundation; in fact, the foundation’s share was 37 percent or about a third of the total.