Teacher evaluation pilot hints at strengths, weaknesses

More than nine out of 10 of Colorado teachers evaluated during a pilot test of the state’s educator effectiveness were rated proficient or higher on the system’s five-step rating scale.

But teachers performed less well on content knowledge and facilitating learning, the two professional standards most directing related to student achievement. On those standards, 87 percent of teachers were rated proficient or higher.

And on one element of content knowledge, literacy development, only two-thirds of teachers received proficient ratings or better.

These results of the 2012-13 evaluation pilot program were presented to the State Board of Education on Wednesday.

Britt Wilkinfeld, a CDE data analyst who is studying the results of the pilot, told the board, “We think some of the preliminary findings are exciting” but cautioned against reading too much into the pilot data. “These are all preliminary findings. … We shouldn’t jump to conclusions; there are so many possible interpretations.”

Noting the relatively low performance on literacy development, Wilkinfeld said, “That’s a flag for us.”

The overwhelming number of teachers in the pilot who received proficient ratings could raise questions about whether the new system differentiates enough between the state’s strongest and weakest teachers. The new evaluation system was intended both as a tool for giving teachers more information about how they can improve their effectiveness and eventually to help districts better identify their highest and lowest performers for personnel decisions and move beyond a system in which the vast majority of teachers are rated simply “satisfactory.”

But it’s unclear how precisely the pilot results will predict what the range of ratings will be once the new evaluation system has been fully implemented. For one thing, the test evaluations covered only a small slice of the state’s approximately 50,000 teachers. Some 1,900 teachers from 164 schools in 25 districts were evaluated during the pilot program.

And those teachers were evaluated only on professional practice, which will account for just half of the full evaluation system. Senate Bill 10-191, the landmark law that led to the new evaluations, requires that the other half of teacher (and principal) evaluations be based on student academic growth as measured by performance on statewide tests and a wide variety of other student assessments. The academic growth half of the system is being rolled out this year, but academic growth data about the pilot evaluations will be available in October.

The professional practice half of the evaluation system, whose results are included in the pilot, includes five “standards” – content knowledge, establish environment, facilitate learning, reflect on practice and demonstrate leadership.

Each standard has between three to eight more specific “elements” on which teachers are evaluated. Each teacher evaluated during the pilot was rated in one of five categories – not evident, partially proficient, proficient, accomplished or exemplary.

When the full evaluation system is in place, a teacher’s professional practice ratings will be combined with the student growth data to place a teacher in one of four categories – ineffective, partially effective, effective or highly effective.

Here’s a summary of pilot evaluation results on the five standards:

Content knowledge – 87 percent rate proficient, accomplished or exemplary
Establish environment – 92 percent proficient or higher
Facilitate learning – 87 percent proficient or higher. (Four of the lowest rated elements – human development, use of technology, high expectations and use of assessments were in the standard.)
Reflect on practice – 87 percent proficient or higher
Leadership – 92 percent proficient or higher

Department of Education analysis of the pilot results found that results varied widely across districts (which weren’t individually identified).

Elementary teachers performed the best, followed by middle school and then high school teachers; non-probationary teachers performed better than probationary teaches and “on average, teachers with higher ratings have more years of experience and earn a higher salary.”

The department also concluded that “the distributions of teacher ratings across elements and Quality Standards indicate that the professional practice rubric captures multiple aspects of teaching as well as differences in teacher practice” and that “the variability in the distribution of ratings suggests that principals (or other teacher evaluators) are able to differentiate between teachers and assign ratings in a meaningful way.

A slide show prepared for presentation to the board had some upbeat conclusions about the pilot:

“It can be done! It is time-consuming at first, but all learning processes are and our pilots say it is worth it.”
“We know it is stressful and a lot of work but our pilots recommend: ‘Just get started!’ Once districts start the process it becomes less daunting.”
“There’s power in this process! It’s changing conversations about professional practice across the state.”

The pilot also prompted CDE to make some changes in the evaluation process, including changing the “not evident” category to “basic,” shortening the evaluation rubric (score sheet) by six pages, tweaking language and eliminating redundancies and removing some non-observable practices.

Board members didn’t have a lot to say about the report – it came at the end of a long and tiring day.

Board chair Paul Lundeen, noting the large number of districts that are using the state’s model evaluation system, called that “increased centralization … I’ve got some concerns about that.”

Lundeen and fellow Republican member Debora Scheffel frequently express concerns about what they see as increased centralized control on education by both the state and federal governments.

There were 15 pilot districts, mostly smaller ones but also including St. Vrain. Elements of the system also were tested in 12 other “integration” districts organized by the Colorado Legacy Foundation. (Those districts also did implementation work on the state’s new content standards.) That list also included several smaller district but also the Thompson and Eagle County districts.

The key elements of SB 10-191 require:

Annual evaluations for all principals, teachers and other licensed personnel
Half of an educator’s evaluation be based on student growth
Educators to earn three consecutive years of effective ratings to be granted non-probationary status
Revocation of non-probationary status if an educator receives two consecutive years of ineffective ratings
Mutual consent hiring practices, requiring the agreement of the principal and teacher before a job placement can be made

CDE has created a model evaluation system that districts can use. District also can use their own systems if they meet overall state standards.

A bit testy about testing

The board Wednesday also got an update on coming changes in the state testing system. (For background see this slide show used by CDE testing chief Joyce Zurkowski and this recent EdNews story.)

Board member Angelika Schroeder raised concerns – as she has before – that districts aren’t ready for online testing. “I have been waiting for almost a year for some kind of a report on what is the state of technology district by district.”

Zurkowski noted that districts aren’t required to submit technology-readiness information to CDE, although the department is seeking that information. “I don’t believe I will ever be able to look at you and say here are all 178 districts.” She said department estimates show 98 percent of schools can finish online testing in three weeks.

Scheffel, who has an academic and professional background in assessment, said she’s worried that not enough is known about the nuts and bolts of the new tests being developed by the Partnership for Assessment of Readiness for College and Careers. “How can we get much greater involvement in this process?” including review of test questions, she asked.

Member Elaine Gantz Berman interjected, “What Deb is asking for is so minute … that is the staff role … that is not the role of a board member.”

The prompted a bit more back and forth until the board agreed to defer discussion on the issue until a later date.