Are Children Learning

Explaining the ISTEP debate: 6 reasons why the test ballooned

PHOTO: Alan Petersime
Frustrations with repeated problems with ISTEP have lawmakers looking for solutions.

The Indiana legislature is moving fast to cut at least three hours from the state ISTEP after two weeks of sharp words and behind-the-scenes negotiations over its length. Lawmakers are expected to rush a bill through both houses for the governor to sign next week to make the changes.

But with kids just days away from taking the exam, some are still asking: what caused the blow up?

The answer is a little complicated, but here are six reasons why ISTEP more than doubled in length from last year:

1. When standards change, tests must also change.

A big fight over Indiana’s academic standards last year ended when the state rapidly changed course and adopted quickly assembled new standards.

That disrupted a carefully coordinated plan in place since 2010 for the Indiana to adopt Common Core Standards along with 45 other states and use a shared exam that would test student knowledge with results that would be comparable across the country.

When Gov. Mike Pence and state Superintendent Glenda Ritz took office in 2012, Indiana had already adopted Common Core. Schools were putting it in place grade by grade, and a new Common Core-linked exam was scheduled to replace ISTEP this year.

But Pence was wary of the shared test — called the Partnership for the Assessment of Readiness for College and Careers or PARCC — and ordered the state to withdraw from the consortium creating the test in 2013. Six months later, both Pence and Ritz supported the idea of Indiana dropping out of Common Core and endorsed new locally made standards that were adopted last April.

Like Common Core,  Indiana’s new academic standards are more in-depth and ask students to do more analysis and critical thinking.

A test matching those expectations was needed in a hurry. Instead of taking years to adapt to the new standards and create the new exam, Indiana tried to do the whole process in a matter of months. That meant asking a lot of the 2015 ISTEP.

2. This year’s test had two extra goals — add questions to match the new standards and help create a test to replace ISTEP in 2016.

More difficult standards naturally meant Indiana needed a more difficult test. But there wasn’t time to completely overhaul ISTEP this year.

Instead, ISTEP was modified for this year to add several extra features. Many of the new standards were similar to the old standards, so many questions roughly matched the style and difficulty of past ISTEP exams. But new questions were added to also test students on new, tougher concepts included in the new standards, which were designed to make sure they graduate high school ready for college and careers.

The online version of ISTEP, for example, includes more advanced testing methods that ask kids to not only answer multiple-choice questions, but also answer questions in new ways, such as by dragging and dropping points on a graph or using drop-down menus.

Finally, this year’s ISTEP had one more job: Try out some questions that could be used on the 2016 exam.

But there was a problem. Indiana law requires release each year of all essay or short-answer test questions that are used in scoring. This would turn out to be a big factor in the length of the test.

3. A huge number of questions on this year’s test actually don’t count in a student’s score.

When test questions are released to the public they are effectively retired. They can never be used again on ISTEP.

So for this year’s exam, there were two big sets of essay and short answer questions: one group that counted toward each student’s score and must be released plus a large second set being tried out for use in 2016 that wouldn’t count.

Trying out questions is important. Test makers examine how students score on them to look for unexpected surprises. Questions they ask include: Was the question harder or easier for students than predicted? Was there reason to believe it was confusing to children? Was there any evidence the question was unfair to certain groups of students?

Trying out enough questions to be able to make a completely new test for 2016 was the main factor that caused what is normally a six-hour test to swell to more than 12 hours this year. All along, however, this was intended as a one-year problem. Future state exams are expected to be only slightly longer than the six-hour tests of the past.

The legislature appears poised to waive for one year the requirement that all essay and short-answer questions be released. This would allow some of this year’s questions to be reused so there could be far fewer extra questions that don’t count.

4. A longer test means more school days devoted to testing.

Indiana students don’t take all of ISTEP at once. They take sections of the exam in smaller doses over several days.

At its Feb. 4 meeting, the state board increased the number of days schools are allowed to use to give the test. The tests will be given over the course of almost a month, beginning Feb. 25 and ending in late March, followed by another set of testing days over three weeks at the end of April into May.

Schools can choose how to split up the parts of the test. Students might take just one section per day or do more depending on what teachers and principals decide. Danielle Shockey, the state’s deputy superintendent, said a testing day could take many shapes. In some schools, student take one 35-minute test section each day. In some schools, they spend an hour each day on testing. Other schools may do more.

“They have a long window of time,” Shockey said. “They can take one session a day if they so choose. It’s a local choice.”

5. Test makers had to consider that ISTEP is plays a critical role in school A-to-F grades and teacher evaluation ratings.

ISTEP is used to measure two things: how much students know of the content they were expected to learn this year, and how much they’ve improved from a previous year. Both factor into how Indiana measures the quality of schools with its A-to-F grading system, as well as how it evaluates teachers.

To determine a school’s A-to-F grade, the state considers both the percentage of students who pass ISTEP and how much students improved from last year. For teachers, the state expects to see their students’ test scores improve over the prior year.

When tests are roughly the same each year — measuring the same standards and using similar types of questions — it is easier to gauge how much students improved from the prior year. But when the standards change and the questions are crafted differently, test makers have to add extra questions to help determine each student’s improvement from the last test.

This spring’s test will include a few questions in English and math that are specifically designed to estimate roughly on what grade level each student best fits. For example, a fourth grade test might include a few third grade level questions and a few fifth grade level questions. Some students might do well on only the third grade questions but poorly on harder questions. Others might do well on all the questions, even the more challenging fifth grade questions.

Those extra questions help the test makers better estimate whether the student improved a little, a lot or not at all over the prior year. However, those extra questions also lengthen the test, but only by minutes, not hours, Michele Walker, testing director for the education department, said. The legislature agreed they were worth keeping — those questions will remain under the plan to shorten ISTEP.

6. Then, there’s the social studies question.

The federal No Child Left Behind Act, signed into law by President Bush in 2002, requires states to test students in English and math each year in grades 3 to 8, and once in high school, and also in science once during elementary, middle and high school.

Noticeably absent? Social studies.

Although Indiana’s social studies ISTEP test is only given to fifth- and seventh-graders each year, accounting for about an hour of testing for those grades, Pence’s test consultants recommended cutting that subject to reduce testing time further since it is only required by state law. That means the legislature could make an exception for this year.

State board members were divided on this idea. Some worried that it would send the message that social studies is not important. Others argued one hour for just two grades doesn’t add much test taking time.

But the legislature liked the idea of reducing test time further this way, so the Indiana Department of Education has told schools to expect the social studies exam to be optional this year. That means some students will take it, if the school decides they should, and others will be allowed to drop it for this year only.

measuring up

After criticism, Denver will change the way it rates elementary schools

PHOTO: Denver Post file
Eva Severance, a first-grader, concentrates on a reading lesson at Lincoln Elementary in Denver.

Facing criticism that its school ratings overstated young students’ reading abilities, the Denver school district announced it will change the way elementary schools are rated next year.

The district will increase the number of students in kindergarten, first, second, and third grade who must score at grade-level on early literacy tests for a school to earn points on the district’s rating scale, and decrease how many points those scores will be worth, officials said.

The changes will lessen the impact of early literacy scores on a school’s overall rating, while also raising the bar on how many students must ace the tests for a school to be considered good. Denver rates schools on a color-coded scale from blue (the highest) to red (the lowest).

“We want to see more students making more progress,” Superintendent Tom Boasberg said.

Local civil rights groups, elected officials, educators, and education advocates criticized Denver Public Schools this year for misleading students and families with what they characterized as inflated school ratings based partly on overstated early literacy gains.

“At a time when this country is at war on truth, we have an obligation to Denver families to give them a true picture of their schools’ performance,” state Sen. Angela Williams, a Denver Democrat, told Boasberg and the school board at a meeting in December.

The groups had asked the district to revise this year’s ratings, which were issued in October. Boasberg refused, saying, “If you’re going to change the rules of the game, it’s certainly advisable to change them before the game starts.” That’s what the district is doing for next year.

The state requires students in kindergarten through third grade to take the early literacy tests as a way to identify for extra help students who are struggling the most to learn to read. Research shows third graders who don’t read proficiently are four times as likely to fail out of high school. In Denver, most schools administer an early literacy test called iStation.

The state also requires students in third through ninth grade to take a literacy test called PARCC, which is more rigorous. Third-graders are the only students who take both tests.

The issue is that many third-graders who scored well on iStation did not score well on PARCC. At Castro Elementary in southwest Denver, for example, 73 percent of third-graders scored at grade-level or above on iStation, but just 17 percent did on PARCC.

Denver’s school ratings system, called the School Performance Framework, or SPF, has always relied heavily on state test scores. But this year, the weight given to the early literacy scores increased from 10 percent to 34 percent of the overall rating because the district added points for how well certain groups, such as students from low-income families, did on the tests.

That added weight, plus the discrepancy between how third-graders scored on PARCC and how they scored on iStation, raised concerns about the validity of the ratings.

At a school board work session earlier this week, Boasberg called those concerns “understandable.” He laid out the district’s two-pronged approach to addressing them, noting that the changes planned for next year are a stop-gap measure until the district can make a more significant change in 2019 that will hopefully minimize the discrepancy between the tests.

Next year, the district will increase the percentage of students who must score at grade-level on the early literacy tests. Currently, fewer than half of an elementary school’s students must score that way for a school to earn points, said Deputy Superintendent Susana Cordova. The district hasn’t yet settled on what the number will be for next year, but it will likely be more than 70 percent, she said. The more points a school earns, the higher its color rating.

The district will also reduce the impact the early literacy test scores have on the ratings by cutting in half the number of points schools can earn related to the tests, Cordova said. This makes the stakes a little lower, even as the district sets a higher bar.

The number of points will go back up in 2019 when the district makes a more significant change, officials said. The change has to do with how the tests are scored.

For the past several years, the district has used the “cut points” set by the test vendors to determine which students are reading at grade-level and which are not. But the discrepancy between the third-grade iStation and PARCC reading scores – and the public outcry it sparked – has caused officials to conclude the vendor cut points are too low.

District officials said they have asked the vendors and the state education department to raise the cut points. But even if they agree, that isn’t a simple or quick fix. In the meantime, the district has developed a set of targets it calls “aimlines” that show how high a student must score on the early literacy tests to be on track to score at grade-level on PARCC, which district officials consider the gold standard measure of what students should know.

The aimlines are essentially higher expectations. A student could be judged to be reading at grade-level according to iStation but considered off-track according to the aimlines.

In 2019, the district will use those aimlines instead of the vendor cut points for the purpose of rating schools. Part of the reason the district is waiting until 2019 is to gather another year of test score data to make sure the aimlines are truly predictive, officials said.

However, the district is encouraging schools to start looking at the aimlines this year. It is also telling families how their students are doing when measured against them. Schools sent letters home to families this past week, a step district critics previously said was a good start.

Van Schoales, CEO of the advocacy group A Plus Colorado, has been among the most persistent critics of this year’s elementary school ratings. He said he’s thrilled the district listened to community concerns and is making changes for next year, though he said it still has work to do to make the ratings easier to understand and more helpful to families.

“We know it’s complicated,” he said. “There is no perfect SPF. We just think we can get to a more perfect SPF with conversations between the district and community folks.”

The district announced other changes to the School Performance Framework next year that will affect all schools, not just elementary schools. They include:

  • Not rating schools on measures for which there is only one year of data available.

Denver’s ratings have always been based on two years of data: for instance, how many students of color met expectations on state math tests in 2016 and how many met expectations in 2017.

But if a school doesn’t have data for one of those years, it will no longer be rated on that measure. One way that could happen is if a school has 20 students of color one year but only 12 the next. Schools must have at least 16 students in a category for their scores to count.

The goal, officials said, is to be more fair and accurate. Some schools complained that judging them based on just one year of data wasn’t fully capturing their performance or progress.

  • Applying the “academic gaps indicator” to all schools without exception.

This year, the district applied a new rule that schools with big gaps between less privileged and more privileged students couldn’t earn its two highest color ratings, blue and green. Schools had to be blue or green on a new “academic gaps indicator” to be blue or green overall.

But district officials made an exception for three schools where nearly all students were from low-income families, reasoning it was difficult to measure gaps when there were so few wealthier students. However, Boasberg said that after soliciting feedback from educators, parents, and advocates, “the overwhelming sentiment was that it should apply to all schools,” in part because it was difficult to find a “natural demographic break point” for exceptions.

Contract review

Here’s what a deeper probe of grade changing at Memphis schools will cost

PHOTO: Marta W. Aldrich
The board of education for Shelby County Schools is reviewing another contract with a Memphis firm hired last year to look into allegations of grade tampering at Trezevant High School. Board members will discuss the new contract Feb. 20 and vote on it Feb. 27.

A proposed contract with the accounting firm hired to examine Memphis schools with high instances of grade changes contains new details on the scope of the investigation already underway in Shelby County Schools.

The school board is reviewing a $145,000 contract with Dixon Hughes Goodman, the Memphis firm that last year identified nine high schools as having 199 or more grade changes between July 2012 and October 2016. Seven of those are part of the deeper probe, since two others are now outside of the Memphis district’s control.

The investigation includes:

  • Interviewing teachers and administrators;
  • Comparing paper grade books to electronic ones and accompanying grade change forms;
  • Inspecting policies and procedures for how school employees track and submit grades

In December, the firm recommended “further investigation” into schools with high instances of grade changes. At that time, Superintendent Dorsey Hopson emphasized that not all changes of grades from failing to passing are malicious, but said the district needs to ensure that any changes are proper.

Based on the firm’s hourly rate, a deeper probe could take from 300 to 900 hours. The initial review lasted four months before the firm submitted its report to Shelby County Schools.

The school board is scheduled to vote on the contract Feb. 27.

You can read the full agreement below: