The Tests Don’t Test What They Purport to Test

test

My third graders (they’re eight or nine years old — keep that in mind) have finished their state testing for the year. They got off easy — just two tests, ELA and Math. The State of Michigan, a few weeks before the testing window opened, sent a letter to parents explaining the purpose of the tests:

“[The tests] are designed to provide information on student knowledge and ability to be career- and college-ready upon graduation. Schools and districts use the results for curriculum planning and school improvement initiatives that benefit all students.” [Source]

Let’s ignore for the moment the dubious claim that any test can predict how “career- and college-ready” a person will be nine years later and focus on what the tests are supposedly designed to do: “provide information on student knowledge.”

If that were true, most teachers would be fine with them. Want to find out if kids can read? Give them something to read and ask them a handful of questions about it. Need to determine if teachers are teaching kids math? Give them 20 math problems that they might someday encounter in the real world and see if they can figure them out.

Confucious said, “Life is really simple, but we insist on making it complicated.”

The same is true of those who make these tests.

Because instead of assessing whether or not my third-graders can read or do math, here is what they were really tested on:

Stamina

It took my best student nearly three hours to finish the ELA test and almost two to complete the math test. Let me state this as clearly as I can: You don’t need three hours to find out if a kid can read or do math. A three-hour test doesn’t assess ability; it tests stamina.

How to Navigate Foreign Formats

The state of Michigan provides a way for students to practice using the tools they will encounter on the tests. And, in fairness, it does a decent job. However, of the 30 questions provided, not one of them required students to click on the tool necessary to enter a fraction for an answer. So I’ll give you one guess what happened when my third-graders had to enter a fraction on the actual exam.

Intrinsic Motivation

There are, as yet, no stakes attached to these tests for the students. And they know this. The state of Michigan makes sure of it. From the same letter as the one referenced above:

“State assessment results do not impact student grades.”

They don’t impact anything else, either, as far as the students are concerned. Which means that there are really only two reasons for them to try their best. Either they’ve learned to always give their all, or they want to please adults. No wonder, then, that a handful of students breeze through the test every year. I can’t say I don’t understand why.

In an earlier article, I suggested a potential remedy: bribery. That’s because studies show that it works. In one, researchers concluded that if the U.S. had used financial incentives during the 2012 PISA test, the country’s math ranking would have risen from 36th to 19th. In another, the impact of incentives had an effective size similar to a one standard deviation increase in teacher quality or a 20% reduction in class size.

Grit

For the record, I have no problem with these tests assessing student perseverance. Persistence, grit, or whatever you want to call it is a trait that serves people well in school and in life. The tests do an excellent job of assessing it with long reading passages lined up one after another and multistep story problems embedded in a test that students know will take more than two hours.

But if grit is what we’re testing, then be honest about it. Don’t say it’s a reading test or a math test, when it’s really a test of character. School districts use these results to make curricular decisions and journalists report the results to inform the public on the status of our schools. So when a test doesn’t test what it purports to test, it leaves districts fixing a problem that may not exist while ignoring those that do, and newspapers describing the wrong deficiencies.

Reading Ability

You have to be a good reader to do well on the ELA test. That’s good, to a certain extent. But what the test doesn’t do a good job of doing is determining what reading skills students have.  If you want to find out if students understand cause and effect relationships, you can do that with a fairly simple text. Same goes for every other skill students are supposed to learn K-3. But the test doesn’t include below grade level passages, which means that if you can’t read the text, it doesn’t matter how well you can find the main idea, or understand the organizational structure of a non-fiction article, or differentiate between your and the author’s point of view,  or literally any other thing your teacher did an excellent job teaching you.

And if you aren’t reading at grade level come test time, you’re really screwed on the math test. Because the math test only sort of tests math knowledge. Mostly, it’s another reading test. So when the state of Michigan claims that schools “use the results for curriculum planning” and the results show that your students aren’t very good at math, you might want to think twice before throwing out your math curriculum, because you may have a reading problem.

Don’t Trust the Results

The tests don’t test what they purport to test, which makes the results confusing and not very useful. Schools believe they have a problem here when they may actually have a problem there. Journalists write stories with headlines like:

Less than half of 12th-graders can read or do math proficiently

65 Percent of Public School 8th Graders Not Proficient in Reading 

Only 25% of Nashville elementary, middle school students on grade level in reading, math

 

Those are misleading, and they are the gift that keeps giving to those who want to dismantle public schools.  If you set out to design a system to undermine public education, you could do a lot worse than designing tests that are harder than they used to be, longer than they need to be, and have no stakes for the people who take them.

There’s a saying, “Don’t believe the hype,” which suggests people ignore the marketing and media buzz around a phenomenon. When it comes to the standardized tests students are taking today, I suggest people “not believe the tripe.” Because the tests just don’t test what they claim to test.

 

Whole-Grain Pancakes and Courageous Teachers

.

The headline jumped at me from my Facebook feed.

Middle School teacher says he was suspended for making pancakes during PSSAs

My first reaction was, “Clickbait. There must be more to the story.” So I read it. And there was more to the story. By the time I got to the end of it, I said, “You have to be (expletive) kidding me.” I had to repress a very strong impulse to fire off a fusillade of emails to the many moronic adults involved in this, um… incident(?).

Here’s what happened: It was testing day. An eighth-grade social studies teacher in Pennsylvania named Kyle Byler decided to make whole-grain pancakes for his students so they could eat during the test. The assistant principal, a woman with the perfectly villainous surname of Grill, walked in, and, according to an article on Lancaster Online, “questioned why he was making breakfast for his students.”

(Because, how dare he…?)

Within 24 hours, Byler was pulled into a meeting with administrators. He left that meeting convinced he was going to be fired.

Byler is, of course, exactly the kind of teacher who always seems to pop up in stories like these. He’s effective, dedicated, selfless, and popular. Parents call him “the eighth-grade dad.” Students call him, “an awesome teacher.” He helps out with student council and coaches basketball. So it’s probably not surprising that 30 students spent two hours protesting outside the middle school when Byler wasn’t at work the following day and 100 people showed up at the next school board meeting.

Byler wasn’t sure what he did wrong. Neither is any other thinking person. But Nicole Reigelman, who has the thankless job of being the spokesperson for the Pennsylvania DOE, had an idea. While serving food is not actually a violation of any testing rule, tending to a griddle, according to Reigelman, “would have likely interfered with ‘actively monitoring’ the assessment.”

Let’s think about that. The state tells teachers that they have to “actively monitor” students during a test that teachers don’t want to give in the first place, that will be used to label their schools as failures, that will feed the bullshit narrative that American schools are failing, and that can result in a low evaluation and possibly even their own dismissal.

And the reason teachers have to “actively monitor” students is to ensure that the results are valid. Except that, regardless of how well students are actively monitored, the test results aren’t valid. They’re taken over the course of just a few days out of the whole year and there are no stakes for the students, which means there’s really no reason for students to even try on them.

So, really, teachers are supposed to actively monitor their students to ensure the appearance of validity, so that when the state — results now clutched firmly in its punitive fist — comes back and says, “You guys suck,” everyone can nod their heads and say, “Well, those teachers were really watching those kids. We know they didn’t cheat, so I guess they really do suck.” (And since 95% of students at Byler’s school come from low-income households, you can be pretty sure that’s exactly what the state will say.)

The reason the teacher is asked to ensure this veneer of validity for a test that is likely to be used to harm both teachers and students is because, even though the state claims these tests are so important that they have to pass rules to ensure students are actively monitored, they’re not quite important enough for the state to hire its own proctors to administer the exams. That would cost money, so they dump the job on teachers.

The ones who better not serve any damn whole-grain pancakes during their precious tests.

But if the surreal stupidity ended with the Pennsylvania Department of Education, that wouldn’t be so egregious. We expect Kafkaesque bureaucracies. Let’s talk about the assistant principal, Marian Grill.

One of Byler’s students is quoted in the article as saying, “The moment she walked in, everybody turned. She was the distraction. Not pancakes. Not Byler.”

Grill is an educator. Or at least, that’s what she’s supposed to be. And the ball was totally in her court in this situation. Not only did she drop that ball, she jammed a screwdriver through it. Here is what Grill should have done upon entering Byler’s room:

–Noticed students quietly working on their tests while eating whole-grain pancakes.
–Thought to herself, “What a dedicated teacher these students have. Not only is he trying to ensure they do their best on this important test by doing exactly what the research says schools should do (feed kids), he’s doing it out of his own pocket.”
–Smiled at Mr. Byler. Gave him a thumbs-up. Maybe asked for a pancake. Left the room.

I don’t know Marian Grill, but I think I know her type. She seems like the kind of administrator who watches you teach a flawless lesson, then criticizes you because the floor was messy or Joey was leaning in his chair. She’s the member of the Homeowner’s Association who has a problem with you flying an Easter flag. She’s the kind of person who, intoxicated by even the smallest amount of power, abuses the hell out of it. And I guarantee you that Marian Grill has no problem with pancakes. She has a problem with teachers doing things without clearing it with her first.

This should have ended with her, if only her ego had allowed it to.

Fortunately, petty tyrants like Marian Grill can be quickly exposed in today’s world. Just ten years ago, assistant principals like Grill could act with impunity. With an obvious imbalance of power and an awful economy, teachers wouldn’t take the risk of antagonizing their bosses. Times have changed, and social media is mistreated teachers’ strongest weapon. It can do what your feckless union can’t or won’t.

You don’t need strength in numbers.

You don’t need t-shirts.

You don’t need a vote.

All you need is a compelling story and to be in the right.

You see the influence of social media across the country, from the West Virginia and Oklahoma walk-outs, organized without union leadership by teachers who put out the call on Facebook and Twitter, to individual teachers like Kyle Byler, who, instead of keeping his mouth shut out of a fear of sabotaging his chances at finding another job after losing this one, had the courage to fight back by simply telling his story and letting the indignant masses do what indignant masses do in the digital age.

Byler kept his job, and the school district, as districts often do when caught with their pants around their ankles, claimed that no, no, no his job was never in any jeopardy at all.

You can believe the embarrassed school district officials who didn’t want this thing getting any bigger than it had, or you can believe the teacher.

Regardless, his district owes him more than his job. He should have never feared for that to start with. They owe him an apology because they’re the ones that lost sight of the purpose of education.  They owe him the money they withheld during his suspension. They might owe him a new assistant principal.

The lessons here are many.

First, state tests make people act like fools. It’s the unintended consequences of these tests that are always the problem. Well-meaning people lose focus on what really matters in their quest to tack a couple of percentage points onto last year’s scores.

Second, we need administrators to rise above misguided state priorities. Just because the state tells them to care about the test, doesn’t mean they have to. Just because the state wants third-graders “college and career-ready,” doesn’t mean educators have to buy into that standard. Policies aren’t made by people in schools. That’s why so many of them stink. But administrators and teachers are in schools. They are the experts. They know better. And sometimes, they need whole-grain pancakes more than they need to be actively monitored.

Third, we need more courageous teachers like Kyle Byler. As he and the teachers who walked-out across this country have proven, courageous teachers — those who stand up and speak out, who call attention to exploitation, unfairness, and plain old human stupidity — improve their own circumstances, but they also make things better for teachers everywhere.

So serve the whole-grain pancakes. Do what’s right for kids. And if someone tries to stop you, plaster their name all over the Internet. They deserve what they get.

Want Better Scores on the State Test? Bribe Your Students!

.

Way back when “Return of the Mack” was on regular rotation in my off-campus apartment and Randy Quaid saved the planet from aliens, I first learned about Alfie Kohn. I was in an undergraduate teacher prep class and we read an article of Kohn’s (it might have been this one) where he argued that rewarding kids at school for things they did well wasn’t any better than punishing them for things they did poorly. Kohn expands on this idea in his book, Punished By Rewards, which made a big splash in the 90s because, while society had moved away from the draconian punishments of yesteryear and state laws now forbade corporal punishment, rewards were passed out like, well, candy. Or colorful pencils. Or those awesome scratch-and-sniff stickers. Or gold stars. Or promises of ice cream parties. Or erasers. Or, well, you get the point. And now here came Kohn scolding teachers all over again.

And so I started my teaching career as most naive, just-released-from-college kids do. With the proper amount of self-righteousness and arrogance, I marched into my classroom determined to offer no rewards. Students would learn for knowledge’s sake. We would build a community and have respect for each other. We would talk about our problems and address underlying causes of misbehavior.

Then the real world hit and doing all of those things was really, really hard.

Some kids were just plain jerks who needed to be taught a few hard lessons, if only so the rest of the class would see that you can’t go through life treating people like dirt and get nothing harsher than a counseling session, a behavior plan, and rewards for doing the very things every other kid in the class was doing as a matter of course. And so I started rewarding some kids, punishing others, and playing that whole game.

And not long after that, I learned first-hand what I had read in a boring old classroom. Alfie was right. Rewards don’t really work. They’re manipulative, frequently arbitrary, and basically no different than punishments (they just feel nicer).

Fast-forward to 2011 and Daniel Pink’s book, Drive, made many of the same arguments. Citing some of the same research as Kohn, Pink concluded that extrinsic rewards are usually a bad idea. Motivation is largely intrinsic and the way to tap into that motivation is through autonomy, a slow and steady march toward mastery, and by doing meaningful things in service to something larger than the self.

All of that is well and good. I accept that it’s generally a bad idea to reward students for their performance and to bribe them to behave better. Make the work interesting. Offer choice. Don’t be such a dictator. Provide feedback so students understand their progress toward mastery. Assign meaningful work. Do all that.

HOWEVER.

The testing window opened in my state this week. Over the next two months, students from third graders to high school juniors will take The Big Test. And big it is. Schools will be judged on the results. They’ll be labeled on some silly statewide reporting system. Some will face consequences. Teachers will be evaluated based on the results. Some may lose their jobs. The scores will influence public opinion of American education as a whole and either burnish or tarnish the reputations of districts, schools, and even entire state’s education systems and policies.

There are plenty of problems with The Big Test, (one of which might be the questionable timing of asking students to take it after they’ve just had 10 days off for spring break, as my wife’s students did this week) but perhaps none are bigger than this:

There is no reason students should try hard on it.

In my state, students get nothing for doing well (it’s kind of like being a teacher in that regard).

No scholarship money.
No name in the local paper.
Not even a pat on the back.

Students suffer no negative consequences for doing poorly.* Nothing will happen to a student who decides to treat the entire enterprise exactly how it deserves to be treated, as a joke. Their scores won’t be reflected on their report cards. Grade point averages will be unaffected. Graduation is not at risk. Students’ parents won’t even learn the results for a number of months after the test is over (and by then, most won’t care). Students won’t be retained or asked to leave school. The only thing they lose is time, and they lose more of it the harder they try.

Subjecting the exams to Pink’s criteria, the tests offer its takers no autonomy. Because it’s a one-time event for which they receive no useful feedback, students cannot progress toward mastery. As for meaning, there is no purpose that students give a hoot about. It is, for almost every student, the very definition of drudgery. It’s busy work. By the state’s own declared aims, it’s got nothing to do with them.  For students, it’s as low-stakes as you can get.

All of which is why you should unabashedly bribe your students to take their time and do their best.

In the adult world, we offer money. In the classroom, we offer pizza, ice cream,  a dance party, video game time, or anything that will make students think twice before just clicking on answers so they can be done with the thing. When there is no expectation of intrinsic motivation, we have to find other ways to get people to try.

And here’s the thing: Bribery works! I have proof!

Every three years, 15-year-olds from around the world take the PISA exam. The results of this test are reported breathlessly in education circles and often lead to huge policy changes in the countries of the students who struggle. A group of researchers wondered an obvious thing. Did kids actually try on these tests? They had reason to be skeptical. There are no stakes for students who take the PISA; they never even get to see their results. And student effort matters. As I tell a handful of parents every year, it’s hard to report on a student’s abilities when they don’t try on their work.

American students traditionally fall in the middle of the pack on the PISA, but perhaps they underperform because they just don’t see the point in doing their best. The researchers decided to test motivation by paying students for their performance. So they pulled 25 math questions off previous PISA exams and they split students into two groups. One group’s participants received $25 and then handed over a buck for every question they missed. Students in the other group got nothing. Here’s what researchers found:

  • Students from Shanghai, who ranked first on the 2012 PISA, did just as well whether they were paid or not.
  • With the exception of low-ability students, U.S. students did better if they were paid.
  • When paid, U.S. students attempted more questions in the second half of the test and were more likely to answer those which they did attempt correctly.
  • Researchers predicted that if the U.S. had used financial incentives during the 2012 PISA test, the country’s math ranking would have risen to 19th, from 36th. (And to 32nd if all other countries also paid their students.)

Here’s a graph:

And here’s more about the study if you want the dirty deets.

Steven Levitt, the economist famous for co-writing the Freakonomics books, performed similar experiments in three Chicago schools. Bribery worked there, too. While there was some variation, Levitt and colleagues concluded:

“The magnitude of the impact of the incentives on that day’s test are quite large: approximately 0.12−0.22 standard deviations, which is similar to effect sizes achieved through a one-standard deviation increase in teacher quality or 20% reductions in class size.”

“Overall, we conclude that both financial and non-financial incentives can serve as useful tools to increase student effort and motivation on otherwise low-stakes assessment tests.”

To bribe effectively, Levitt’s research suggests you do the following:

Offer immediate rewards

If students have to wait, bribery doesn’t work. So you won’t be able to bribe students for improved performance on the state test because the results take too long. But you can bribe them on their effort, and the research suggests that you should.

Have established credibility

Levitt had the most success bribing students at the school where he had done previous experiments. Students there believed him when he said they would get money for doing well. He had less success at less familiar schools. Levitt surmised that those students, having never been paid to perform in a school setting, probably didn’t believe he would deliver and so the proffered bribe had little impact on motivation.

Leverage the power of loss aversion

Bribery worked better when students were given the reward at the start and knew they would have to give it back if they failed. So if you really want to be effective (and yes, maybe a little cruel), buy your class donuts before the test, place one on the corner of each desk, and threaten to take it away if you think they aren’t trying their hardest. (Hey, quit looking at me like that. I’m just reporting the science.)

Consider the age of your students

Smaller awards work with smaller kids, but you’ll need better stuff for high schoolers. Cheap little trophies worked just as well with elementary students as did the promise of ten bucks. However, it took a larger dollar amount ($20) to get older kids to give a damn.

 

You can read the whole study here. But if you would rather not, I understand. And I’m not going to bribe you to do so.

I will, however, attempt to entice you to join my subscriber list. By signing on to the Teacher Habits blog, you will be the first to know about newly released books. You’ll get discounts on those books. You’ll also get new articles emailed directly to your inbox. And you’ll be the first people I ask for advice on book covers and titles. Now aren’t those things better than a trophy?

SUBSCRIBE ME UP

* I am aware that there are stakes for certain students. Those with third-grade reading laws that require retention (my state of Michigan joined that merry bandwagon last year) and students who have to pay to retake the SAT may have all the motivation they need to try hard.