Why the Tests Must Be Kept Secret

I’ll be giving my third-graders the state test in another week, which means I had to read this year’s testing manual and something called an “Assessment Integrity Guide.” That’s the one that explains how vital it is that the contents of the tests are kept secret. It’s 44 pages of rules, justifications, warnings, and procedures, all with the aim of helping to “establish, develop, and implement a state assessment system that fairly, accurately, and with validity measures Michigan’s content standards.”

Which, as someone who’s given the test many times and knows the reality, is kind of funny, but I’ll get to that later.

Because states want to ensure the validity of the results (or at least, that’s what they claim), they go to great lengths to keep test items from escaping the classroom walls. Ideally, the items are known only to those who designed them and the students who are subjected to them.

That’s a problem.

Right now, the Texas legislature is considering a flurry of legislation introduced in the wake of a Texas Monthly article that reported on a study that found wild inconsistencies in readability levels on STAAR tests, with some passages at least one grade level higher than the grade they were meant to assess. The report echoes findings done by the same researchers in 2012. It’s led to a backlash against the test and questions about its validity, with defenders claiming there’s more to reading than Lexile levels and detractors pointing to the tests’ use in high-stakes decisions such as teacher evaluations and student retention.

There was an easy way to prevent the controversy: release the entire test to the public every year once testing had been completed. Let parents, education officials, and legislators see exactly what we’re asking students to know and be able to do. You can bet it would not have taken seven years to come to a head had the tests been available all along. As it stands in Texas right now, the debate is centered around an analysis done by a couple of researchers rather than the contents of the actual tests. Those remain a secret.

So why don’t states simply release the tests each year? Why not get everything out in the open?

According to the Michigan Assessment Integrity Guide,

“The primary goal of assessment security is to protect the integrity of the assessment and to assure that results are accurate and meaningful. To ensure that trends in achievement results can be calculated across years in order to provide longitudinal data, a certain number of test questions must be repeated from year to year. If any of these questions are made public, the validity of the test may be compromised.”

Color me skeptical.

First, let’s use simple language: States don’t want items out in the public because students, parents, and teachers could cheat, which would artificially inflate test scores. False positives, they might be called.

But states seem far less concerned about false negatives. There are few directives in the Assessment Integrity Guide regarding what must be done if a student decides to distract his entire class during testing (he’s supposed to be redirected and then removed, but there are no consequences for administrators who don’t do so).

There is nothing built into the testing system to prevent students from blazing through the tests as fast as they want by just clicking stuff. If a student’s father died the week before testing, she will not receive an exemption from that year’s test because the state is concerned about the integrity of the results. Technology issues are embarrassing, but no state has ever invalidated its results over them, even when they’re widespread. You can be sure their response would be different if those irregularities resulted in potentially higher scores instead of lower ones.

It’s hard to take validity claims seriously when states seem far more concerned with artificially inflated scores but not at all worried about artificially deflated ones.

Second, the claim that test items can’t be released so longitudinal data can be compared is specious. If you want the most valid longitudinal data, you’d use the exact same test every year, but states don’t do that because they’re afraid of cheating. Also, state tests change with the political winds; in my state, the M-STEP replaced the MEAP and now the M-STEP is on its last legs. There’s also the issue of changing cut scores, which makes it challenging to accurately compare year-to-year data.

If you’re going to keep tests secret, it’s nice to have what seems like a legitimate reason to keep people in the dark, and test validity fits that bill. But since that reason is less than convincing, it’s possible there are other reasons states want the tests shielded from public view. Here are three possibilities.

Table of Contents

Money

It costs money to create tests, so one way to spend less is to reuse reading passages and test items. Once items are released, they can’t be used again, so one reason to keep them a secret is to save time and money, something Michigan at least admits (in one sentence) in their lengthy Integrity Assessment Guide (page 5).

But is that a good enough reason? Given how much the results impact students, teachers, schools, and the public’s perception of the education system, it seems legislators should be eager to commit the money necessary to develop a high-quality test each year, while also promoting transparency with the aim of assuring the public that the tests are what they’re purported to be (a valid measure of student learning). The only way to do both is to release the tests in their entirety and create new ones each year.

When states claim they have to keep the tests secret because of validity, what they’re really saying is that they’re keeping the tests secret because they’re cheap.

Or maybe it’s because they’re afraid of what the public will think of their tests.

To Perpetuate the Failure Narrative

Every year around test time, someone calls for legislators to take the test. And they should. So should every parent. If states are going to require schools to rate teachers, and if they’re going to release “report cards” for schools, all with the idea that parents should be informed about their child’s education, then why shouldn’t they also release the tests so that parents can see the tool used to determine the other ratings?

Perhaps it’s because states fear that adults might look at the tests and wonder, “What the hell?”

And if they question the tests, then they might question the results of the tests. If they question the results, then they may start to question the rankings of schools and the ratings of teachers that are based on those results. They might even be skeptical about the whole “American education sucks” thing. And if they question that, well… there are a lot of people who have a lot of power and make a lot of money off the “American education sucks” thing.

In fact, we know this is exactly what happens when adults take the tests, or at least the test items that states do release. From just one of many articles written on the subject:

“The first argument arose over a question about how the first paragraph of the reading selection affected “the plot.” The directions said to choose two answers from six choices. We all agreed on one, but three panelists selected three different choices as the second answer.

All were surprised when others didn’t pick the same response, so they advocated for their answers – attempting to sway consensus to their side. A similar scenario played out in two questions that asked test takers to identify the “best” supporting evidence for a conclusion.

In one case only four choices were given, and we picked three different answers. Then we explained and argued and maybe even raised our voices – it got animated a couple of times – and no one changed answers, though we could see the legitimacy of each other’s reasoning.”

For now, this happens in small pockets with people who have a vested interest in how the state uses the results. Were entire tests released to the public en masse, you’d soon have Facebook challenges called “Are You Smarter Than a Fifth Grade Texan?” that would result in widespread ridicule of the exams.

If these tests could be googled, or if they showed up on your timeline, there is a real risk that the failure narrative would fall apart. Every time a journalist lazily wrote that “45% of third-graders can’t read,” she’d be met by an avalanche of online editors who would correctly point out that not reaching an arbitrary cut score on a test designed to separate students into four bands of performance is not the same as saying someone can’t read, especially when the things they’re being asked to read are on esoteric topics or written at an inappropriately high grade level.

You’d have English teachers with Master’s degrees explaining how many questions they missed, how the test determined they were “partially proficient”, and how the tests we’re using to determine students’ language skills don’t themselves use proper grammar.

You’d have mathematicians pointing out mistakes on math tests.

You’d have successful people who do poorly realizing that maybe the tests aren’t predictive of life outcomes.

If the general public could actually see the exams, they might realize that the reading tests aren’t actually testing whether students can read at all. Even assuming the tests are on grade-level and the questions are age- appropriate and not deliberately confusing, the tests actually assess whether or not students:

Care enough to carefully read the texts and try their best to answer the questions.
Know enough about test-taking to successfully navigate the many twists, turns, and traps test makers lay for them.
Have the stamina to try just as hard at the end of the test as at the beginning.
Have background knowledge on the topic they’re reading about.
Can answer complicated questions about what they’ve read.
Whether they’ve had the opportunity to learn the skills being tested, since tests are usually taken before the conclusion of the school year and students may have missed instruction due to attendance.
Can focus in a potentially less-than-ideal testing environment.

I try not to be being conspiratorial, but when there is big money on the side of school choice and those with that money are using it to buy politicians and write legislation that harms public schools, it’s hard not to consider the idea that many in state government have a personal interest in perpetuating the failure narrative and that they see test results as the surest way to do it.

But that only works if the tests yield results that portray schools negatively. And those portrayals only stick if the general public accepts the results as valid.

And that only happens as long as the tests remain locked away and kept from prying eyes. Because as someone who sees these tests every year, I can tell you that you would be appalled. You would question the very thing states claim they’re trying to preserve. And some of you would not surpass the cut score.