Let’s Talk about Tests: Four Questions to Ask

If you follow education news, politics, and social media, it’s clear that testing is having a moment. I was surprised it wasn’t listed alongside Taylor Swift as a nominee for Time magazine’s 2014 Person of the Year. Everyone–policymakers, unions, state leaders, local administrators, teachers, parents, you name it–seems to agree that the amount of testing and its role in America’s schools and classrooms merit reconsideration. But the momentum of this “over-testing” meme has overshadowed the fact that testing policy is complicated. And when the field talks about “over-testing,” it’s often not talking about the same kinds of tests or the same set of issues.

To help clarify and elevate our over-testing conversation (because it’s here to stay), here are four questions to ask, with considerations to weigh, when deciding whether testing is indeed out of control–and evaluating the possible options to change it.

1. Standardized? In AP Biology, I took a test every week–but only one of them was “standardized” in the way most use the term: the AP test at the end of the year. Debates about testing, however, tend to ignore the common, teacher-developed ones used to assess students’ grasp of content throughout the year (which, let’s be honest, also require a significant amount of time to prepare and take).

In other words, they are usually debates about over-standardized testing–the large-scale assessments, often multiple choice, that are given to thousands of students, with consistent scoring and comparable results. Examples are numerous and varied: Advanced Placement tests, the New York Regents exams, NAEP, Smarter Balanced, Texas STAAR testing, NWEA’s Measures of Academic Progress (MAP), the ACT, Teaching Strategies GOLD, and so on. But with such differences in scope, development, design, and purpose, debating the value of “standardized” testing alone fails to get to the heart, or complexity, of the issue.

2. How much? A growing point of contention is whether standardized tests should be given annually or in grade-spans (once in elementary school, once in middle school, and once in high school). Current federal law is actually a mix of both. Since NCLB’s passage, states have been required to have math and reading standardized tests annually in grades 3-8, but only once in high school. Similarly, state science assessments must be given once in each grade-span (grades 3-5, 6-9, and 10-12). Prior to NCLB, the Improving America’s Schools Act of 1994 required states to have grade-span testing in reading and math for the first time–and many, especially teachers’ unions, are hoping Congress reverts back to the earlier testing mandate, with several possible bills pending or expected in 2015.

But no matter the statutory language, in practice, the question of how much standardized testing is too much can become even more complex. For example,  grade-span testing could be staggered so that students take reading tests in grades 3, 6, and 9, math tests in grades 4, 7, and 10, and science tests in grades 5, 8, and 11. In other words, there are still standardized tests every year. Or, a system of state tests in three grades could be given on top of a system of district tests given in three different grades. Again, the result looks a lot like “annual testing,” just not the kind of statewide, comparable annual testing system NCLB requires today.

To summarize, for those caught up in the over-testing angst, the problem may not be the fact that standardized testing occurs each year, but rather the kinds and number of standardized tests that are administered to kids.

3. Administered to whom? One way some have suggested to combat over-testing, is to use randomized sampling techniques, similar to the NAEP. NCLB required all states to participate in NAEP testing every two years, but every student does not participate, and NAEP tests are not administered in every school. Instead, schools and students are selected randomly to participate so that enough students take the NAEP test for it to produce usable data for the all students group and for particular subgroups, like Black, Hispanic, and low-income kids. Putting aside the fact that NCLB requires assessments to be given to all students and even dings schools in its accountability requirements if they have low participation rates (after all, the law could change), sampling would make it more difficult to produce usable achievement data for individual districts and schools, especially in small schools or rural areas. There’s a reason NAEP can only produce statewide results (plus results for a select group of large districts like Chicago and Atlanta)–the more-detailed the information that is desired, the more students that must be included in the sample.

Another drawback? Tests may need to be used for more than trend analysis–we may actually want accurate information on individual students, teachers, or schools. States may want to use assessments to validate a particular policy approach in a network of schools, identify students at-risk for dropping out of high school to intervene before it’s too late, or determine which schools need additional supports, resources, and technical assistance. Not to mention, some kids wouldn’t be tested at all. What kind of information would these students and their families receive about their educational progress? What kind of message does that send about the importance of mastering state standards? About the importance of equal opportunities and fairness? Sampling is a valid technique, yes, but it may not be the right technique for what policymakers and educators need these tests to accomplish.

4. Administered by whom? And finally, in the ongoing debate about testing, there’s still the question of which part of our fragmented education system administers these tests. NCLB requires states to develop systems for assessing the achievement of all students against the state’s standards, with only limited options for alternate assessments for students with significant disabilities. In simpler terms: students must take the same test, in every district, statewide. And as recent studies have found, many districts then offer their own tests as well, to the point that local tests outnumber the state ones required by NCLB. Increasingly, some districts would like new flexibility to reclaim testing as their own. They propose to administer local assessments in some grades, and the statewide assessment in others (federal policymakers would need to enable this change, as it conflicts with current law).

This might reduce the number of standardized tests students take, especially if they attend school in a district that has adopted numerous local tests on top of what the state requires. But it would certainly increase the variety of tests administered within a state, and in turn, make it much more challenging to compare results across districts or states. Worse, it makes it extremely difficult, if not impossible, to measure student growth if local assessments are not aligned from one year to next, or to each other. Losing comparable data would be a blow not just for accountability, evaluation, and research, but also for communicating about the state of our education system and making smart policy decisions.

As the testing debate continues in the new year, it’s time for the education field to get a little more specific about the testing problem they’re trying to solve–and the trade-offs the proposed solutions may create.

