First Person

The Trouble With Not Releasing State Test Items

First Rule of Fight Club: Do Not Talk about Fight Club

Second Rule of Fight Club: DO NOT TALK about Fight Club

Has the New York State Education Department watched too many Brad Pitt movies? Okay, that’s a rhetorical question, but one that might be posed to other state education agencies also engaged in the business of high-stakes testing. This week, students in grades 3 through 8 across the state of New York are taking mathematics exams aligned with the Common Core State Standards. Following on the heels of last week’s English Language Arts exams, the math exams also promise to be unusually challenging, reflecting the complex skills and knowledge inscribed in the Common Core standards.

Regardless of broad pronouncements from policymakers and the media about the inherent superiority of the Common Core standards and the assessments designed to measure mastery of them, the truth is that no one really knows whether the standards will lead to higher student achievement, or whether the assessments will be good measures of students’ readiness for college and careers. In New York, although this year’s assessments are the first to be aligned with the Common Core standards, they have a short shelf-life: the state plans to administer the Partnership for Assessment of Readiness for College and Careers assessments in the spring of 2015, if those assessments are ready for prime time by then.

In the meantime, discussions about the content and quality of the assessments are hamstrung by New York’s decision not to release test items to the public. For educators, the issue is quite serious: Disclosure of secure test items by a teacher or school leader is considered a moral offense that can lead to disciplinary action, including loss of certification.

The strongest arguments in favor of keeping test questions and answers private are technical. It is desirable that different forms of a test, including those administered in different years, be scaled in such a way that a given score represents the same level of performance, regardless of the test form or year. Anchor items are used to link different forms of a test and equate them. Modern test theory uses the difficulty of test items, and their ability to differentiate higher and lower performers, as tools to estimate a test-taker’s performance. It’s important for anchor items to have a stable level of difficulty over time; if they become easier or harder over time, their ability to serve as a common anchor across test forms is compromised, as is our confidence that a given test score denotes the same level of performance over time. A change in the difficulty of a test item over time is referred to as item parameter drift.

Item parameter drift can occur due to changes in curriculum, teaching to a test, or practice. But the biggest risk is from the widespread release of test items, whether unintentionally, as in a security breach, or intentionally. If a wide swath of the test-taking population knows test questions and the right answers, the questions will be easier, even if the test-takers are not more capable. It’s for this reason that questions and answers in educational tests frequently aren’t released to the public: disclosing test questions would limit their ability to be reused and to serve as anchor items.

The National Assessment of Educational Progress is a case in point. The No Child Left Behind Act provides that the public shall have access to all assessment instruments used in NAEP, but that the Commissioner of the National Center for Education Statistics, which houses NAEP, may decline to make available test items that are intended for reuse for up to 10 years after their initial use.

Of course, one of the other features of the lovely NCLB law is that it prohibits the federal government from using NAEP to rank or punish individual students, teachers, schools or local education agencies. For this reason, NAEP is a low-stakes test — despite the ways in which pundits jump to draw broad policy inferences from comparisons of NAEP performance over time or across jurisdictions.

But one could argue that disclosure of test questions and answers may be justified when the test is used for high-stakes decisions such as student promotion, or the evaluation of teachers and/or schools. For most such high-stakes decisions, there are winners and losers, and when these decisions are made by agents of the government, the losers have a legitimate interest in whether the decisions were fair. One need look back no further than last week, when New York City announced that, due to a series of errors made by NCS Pearson, several thousand children were incorrectly classified as ineligible for gifted and talented programs.

Or, if you wish, reach back to last year, when the New York State Education Department discarded a series of items in the Grade 8 English Language Arts exam based on a passage involving a talking pineapple. Not too many people rose to defend the test items associated with this fable involving a hare and a pineapple, but Pearson, the firm contracted to develop and administer the exam, did. The choice of both the passage and the items, the company claimed, “was a sound decision in that ‘The Hare and the Pineapple’ and associated items had been field tested in New York State, yielded appropriate statistics for inclusion, and it was aligned to the appropriate NYS Standard.” Vetted by some teachers, too, I reckon. But with all of that, the passage and items were ludicrous.

One item following the passage asked which of the animals in the passage was the wisest: the moose, crow, hare or owl. Pearson claimed that it was unambiguous that the wisest animal was the owl, based on clues in the text. One such clue was that the owl declared that “Pineapples don’t have sleeves,” which, Pearson reported, was a factually accurate statement. So too, to the best of my knowledge, is that owls don’t talk.

High-stakes tests administered by governmental agencies call for a heightened sense of procedural fairness, including the ability to interrogate the tests and how they were constructed, and what counts as a correct response. The point is not so much that bad test items get discarded — although that may be appropriate from time to time — as much as it is that the procedures are subject to scrutiny by those they affect. New York does not have a great recent track record on this. The technical reports on the construction of last year’s state English Language Arts and math tests have not been made public yet, even though we’re in the midst of this year’s testing. And the technical manual for New York’s statewide teacher rankings, a modified version of value-added modeling, was released months ago—before the manual for the tests on which those rankings were based. It’s hard to know how much to trust the growth percentiles or value-added models without more information on the tests themselves.

Moreover, it may be especially important to have open and public discussions about tests that are aligned with the Common Core standards, which are new to educators and the public. The point of these tests, especially in their earliest administrations, is really not “ripping the Band-Aid off,” as New York City Schools Chancellor Dennis Walcott has declared — nor is it to document just how few students will meet the new standards, as a vehicle for supporting one policy reform or another. Rather, it’s to engage educators, policymakers and the public in a conversation about what we want our students to know, and how we can move them toward the desired levels of knowledge and skill.

And one good way to frame that conversation is to ground it in the discussion of particular assessment questions. Might teachers disagree with one another about what the best answer to an assessment question is? If they do, shouldn’t they be talking about it? Will students have an opportunity to discuss why a response is incorrect, what a better response might be, and why? Or will they simply receive a scale score telling them, and their parents, that they are well below grade-level?

Much has been made of the notion that assessments aligned with the Common Core standards are to be “authentic,” with real-world content that parallels what students might experience in adult daily life. (Ideally, something more sophisticated than “If Johnny has $5.63 and is wearing a pair of Nike Free Run+ 3 shoes, how long will it take him to run to the 7-Eleven to buy a delicious Coca-Cola product?”) If the content is indeed authentic, and reflective of what we expect students to know and be able to do as productive adults, we should be discussing that content, not hiding it under a rock.

There is a middle ground between total nondisclosure of test items and answers, and complete disclosure. It’s possible to retain the security of anchor items while releasing items that won’t be used again. But it’s easier to do this when there’s a more extensive bank of assessment items with known properties, and such an item bank for the Common Core does not yet exist. It may not be the most popular conclusion, but perhaps we should be investing more in the development of good assessment items.

First Rule of High-Stakes Assessments: Talk about High-Stakes Assessments

Second Rule of High-Stakes Assessments: TALK about High-Stakes Assessments

This post also appeared on The Hechinger Report’s Eye on Education blog.

First Person

I’m a Florida teacher in the era of school shootings. This is the terrifying reality of my classroom during a lockdown drill.

Outside of Marjory Stoneman Douglas High School in Parkland, Florida. (Photo by Mark Wilson/Getty Images)

“Remember,” I tell the children, looking them in the eyes in the darkened classroom. “Remember to keep the scissors open. They’ll stab better that way.”

My students, the target demographic for many a Disney Channel sitcom, laugh nervously at me as they try to go back to their conversations. I stare at the talkative tweens huddling in a corner and sigh.

“Seriously, class,” I say in the tone that teachers use to make goosebumps rise. As they turn back to me with nervous laughter, I hold up that much-maligned classroom tool, the metal scissor that’s completely ineffective at cutting paper. “If a gunman breaks in, I’ll be in the opposite corner with the utility knife.” Said tool is in my hand, and more often used to cut cardboard for projects. All the blood it’s hitherto tasted has been accidental. “If I distract him and you can’t get out, we have to rush him.” I don’t mention that my classroom is basically an inescapable choke point. It is the barrel. We are the fish.

They lapse into silence, sitting between the wires under the corner computer tables. I return to my corner, sidestepping a pile of marbles I’ve poured out as a first line of defense, staring at the classroom door. It’s been two hours of this interminable lockdown. This can’t be a drill, but no information will be forthcoming until it’s all over.

I wonder if I really believe these actions would do anything, or am I just perpetrating upon my students and myself the 21st century version of those old “Duck and Cover” posters.

We wait.

The lockdown eventually ends. I file it away in the back of my head like the others. Scissors are handed back with apathy, as if we were just cutting out paper continents for a plate tectonics lab. The tool and marbles go back into the engineering closet. And then, this Wednesday, the unreal urge to arm myself in my classroom comes back. A live feed on the television shows students streaming out of Marjory Stoneman Douglas, a high school just a short drive away. I wonder whether the teachers in its classrooms have passed out scissors.


The weapons. It’s not a subject we teachers enjoy bringing up. You’d have an easier time starting a discussion on religion or politics in the teacher’s lounge then asking how we all prepare for the darkness of the lockdown. Do you try to make everyone cower, maybe rely on prayer? Perhaps you always try to convince yourself it’s a drill. Maybe you just assume that, if a gun comes through the door, your ticket is well and truly up. Whatever token preparation you make, if at all, once belonged only to the secret corners of your own soul.

In the aftermath of Parkland, teachers across the nation are starting to speak. The experience of being isolated, uninformed, and responsible for the lives of dozens of children is now universal to our profession, whether because of actual emergencies or planned drills. You don’t usually learn which is which until at least an hour and sometimes not until afterwards. In both cases, the struggle to control the dread and keep wearing the mask of bravery for your students is the same.

And you need a weapon.

I’ve heard of everything from broken chair legs lying around that never seem to be thrown away to metal baseball bats provided by administration. One teacher from another district dealt with it by always keeping a screwdriver on her desk. “For construction projects,” she told students. She taught English.

There’s always talk, half-jokingly (and less than that, lately) from people who want teachers armed. I have a friend in a position that far outranks my own whose resignation letter is ready for the day teachers are allowed to carry guns in the classroom.

I mean, we’ve all known teachers who’ve had their cell phones stolen by students …


Years earlier, I am in the same corner. I am more naïve, the most soul-shaking of American massacres still yet to come. The corner is a mess of cardboard boxes gathered for class projects, and one of them is big enough for several students to crawl inside.

One girl is crying, her friend hugging her as she shakes. She’s a sensitive girl; a religious disagreement between her friends having once brought her to tears. “How can they be so cruel to each other?” She asked me after one had said that Catholics didn’t count as Christians.

I frown. It’s really my fault. An offhand comment on how the kids needed to quiet down because I’m not ready to die pushed her too far. Seriously rolling mortality around in her head, she wanted nothing more than to call her family. None of them are allowed to touch their cell phones, however, and the reasoning makes sense to me. The last thing we need is a mob of terrified parents pouring onto campus if someone’s looking to pad their body count.

She has to go to the bathroom, and there are no good options.

I sit with her, trying to comfort her, wondering what the occasion is. Is there a shooter? Maybe a rumor has circulated online. Possibly there’s just a fleeing criminal with a gun at large and headed into our area. Keeping watch with a room full of potential hostages, I wonder if I can risk letting her crawl through the inner building corridors until she reaches a teacher’s bathroom. We wait together.

It seemed different when I was a teen. In those brighter pre-Columbine times, the idea of a school shooting was unreal to me, just the plot of that one Richard Bachman book that never seemed to show up in used book stores. I hadn’t known back then that Bachman (really Stephen King) had it pulled from circulation after it’d been found in a real school shooter’s locker.

Back then my high school had plenty of bomb threats, but they were a joke. We’d all march out around the flagpole, sitting laughably close to the school, and enjoy the break. Inevitably, we’d all learn that the threat had been called in by a student in the grip of “senioritis,” a seemingly incurable disease that removes the victim’s desire to work. We’d sit and chat and smile and never for a second consider that any of us could be in physical danger. The only threat we faced while waiting was boredom.


Today, in our new era of mass shootings, the school districts do what they can, trying to plan comprehensively for a situation too insane to grasp. Law enforcement officials lecture the faculty yearly, giving well-rehearsed speeches on procedures while including a litany of horrors meant to teach by example.

At this level, we can only react to the horrors of the world. The power to alter things is given to legislators and representatives who’ve been entrusted with the responsibility to govern wisely while listening to the will of the people. It’s they who can change the facts on the ground, enact new laws, and examine existing regulations. They can work toward a world where a lockdown is no longer needed for a preteen to grapple with gut-churning fear.

We’re still waiting.

K.T. Katzmann is a teacher in Broward County, Florida. This piece first appeared on The Trace, a nonprofit news site focused on gun violence.

First Person

What we’ve learned from leading schools in Denver’s Luminary network — and how we’ve used our financial freedom

PHOTO: Nicholas Garcia
Cole Arts and Science Academy Principal Jennifer Jackson sits with students at a school meeting in November 2015.

First Person is a standing feature where guest contributors write about pressing issues in public education. Want to contribute? More details here

Three years ago, we were among a group of Denver principals who began meeting to tackle an important question: How could we use Colorado’s innovation schools law to take our schools to the next level?

As leaders of innovation schools, we already had the ability to make our own choices around the curriculum, length of school day, and staffing at our campuses. But some of us concluded that by joining forces as an independent network, we could do even more. From those early meetings, the Luminary Learning Network, Denver’s first school innovation zone, was born.

Now, our day-to-day operations are managed by an independent nonprofit, but we’re still ultimately answerable to Denver Public Schools and its board. This arrangement allows us to operate with many of the freedoms of charter schools while remaining within the DPS fold.

Our four-school network is now in its second year trying this new structure. Already, we have learned some valuable lessons.

One is that having more control over our school budget dollars is a powerful way to target our greatest needs. At Cole Arts & Science Academy, we recognized that we could serve our scholars more effectively and thoughtfully if we had more tools for dealing with children experiencing trauma. The budget flexibility provided by the Luminary Learning Network meant we were able to provide staff members with more than 40 hours of specially targeted professional development.

In post-training surveys, 98 percent of our staff members reported the training was effective, and many said it has helped them better manage behavioral issues in the classroom. Since the training, the number of student behavior incidents leading to office referrals has decreased from 545 incidents in 2016 to 54 in 2017.

At Denver Green School, we’ve hired a full-time school psychologist to help meet our students’ social-emotional learning goals. She has proved to be an invaluable resource for our school – a piece we were missing before without even realizing how important it could be. With a full-time person on board, we have been able to employ proactive moves like group and individual counseling, none of which we could do before with only a part-time social worker or school psychologist.

Both of us have also found that having our own executive coaches has helped us grow as school leaders. Having a coach who knows you and your school well allows you to be more open, honest, and vulnerable. This leads to greater professional growth and more effective leadership.

Another lesson: scale matters. As a network, we have developed our own school review process – non-punitive site visits where each school community receives honest, targeted feedback from a team of respected peers. Our teachers participate in a single cross-school teacher council to share common challenges and explore solutions. And because we’re a network of just four schools, both the teacher council and the school reviews are small-scale, educator-driven, and uniquely useful to our schools and our students. (We discuss this more in a recently published case study.)

Finally, the ability to opt out of some district services has freed us from many meetings that used to take us out of our buildings frequently. Having more time to visit classrooms and walk the halls helps us keep our fingers on the pulse of our schools, to support teachers, and to increase student achievement.

We’ve also had to make trade-offs. As part of the district, we still pay for some things (like sports programs) our specific schools don’t use. And since we’re building a new structure, it’s not always clear how all of the pieces fit together best.

But 18 months into the Luminary Learning Network experiment, we are convinced we have devised a strategy that can make a real difference for students, educators, and school leaders.

Watch our results. We are confident that over the next couple of years, they will prove our case.

Jennifer Jackson is the principal of Cole Arts & Science Academy, which serves students from early childhood to grade five with a focus on the arts, science, and literacy. Frank Coyne is a lead partner at Denver Green School, which serves students from early childhood to grade eight with a focus on sustainability.