FacebookBlueskyLinkedInShare

Three Forces Working Against Assessment Coherence

Blog header graphic for "Three Forces Working Against Assessment Coherence" by Sarah Quesen, Senior Director of Assessment at WestEd.

By Sarah Quesen, Director of Assessment, WestEd

There is no shortage of frameworks describing what a coherent assessment system should look like. WestEd has defined characteristics of coherent and effective assessment systems (Arnold & Webb, 2024). The National Academy of Education (NAEd) published a volume on coherent assessment systems in 2024. The Center for Assessment has written extensively about the topic. And yet coherent systems remain rare in practice. As the NAEd editors noted, “There are few examples of well-functioning assessment systems where substantive coherence can be seen among the representations of learning goals at classroom, district, and state levels” (Marion et al., 2024, p. 5).

Why is this? Largely because there are external forces that often work against coherence, and we don’t talk about them enough. By coherence, we mean a system in which each assessment serves a distinct and understood purpose, the measures work together rather than in contradiction, and the people using the data have the literacy to act on them. We frame the barriers to coherence in terms of three forces. None of these forces are the work of bad actors. Each force is driven by reasonable people responding to real pressures, which is part of what makes them so persistent.

Force 1: Add but Never Subtract

Every new concern in education produces a new assessment. Dissatisfaction with reading outcomes? Add screeners. Accountability pressure? Add interims. A legislative mandate? Add another measure. Legislatures, commissioners, and advocacy groups often push in the same direction: more measurement.

But nothing gets removed. Nobody wants to be the person who eliminates an assessment because, if a student struggles, the question becomes “Why did you take that measure away?” It’s the “Chesterton’s Fence” of testing: We don’t know why this test was put here in 2012, so we’re terrified to tear it down in 2026.

A secondary reason is institutional: Every assessment accumulates a constituency. Contracts, budget lines, and reporting routines build up around a measure over time, and nobody wants to pick the fight that comes with taking one away.

The result is that districts end up administering assessments for which they cannot explain the purpose. In our work with states, we have seen districts struggle to articulate why a given assessment is administered when asked to do so as part of a statewide systems audit.

The cycle: Results on the summative don’t improve, so someone adds a new interim or benchmark. The measures use different scales, so they don’t always agree. That creates confusion about which data to trust, and this leads to results that still don’t improve. Then the cycle continues.

States and districts are investing heavily in interim and benchmark assessments, often in response to superintendents and school boards looking for earlier signals about student progress, without a lot of evidence that these investments translate into better outcomes for students. The NAEd volume on coherent assessment systems notes that systems can become measurement-heavy while still failing to build the classroom assessment practices and shared understanding that make data usable (Marion et al., 2024). The result is what we think of as the assessment paradox: more measurement, but less clarity about what students know and what to do about it.

Force 2: The Communication Trap

States often treat assessment communication as a messaging problem. It can be a design problem too. And it shows up in three ways.

  • The architecture problem. When a system includes a growth score from one vendor, a proficiency level from the state, and a percentile rank from a third product, parents receive three numbers and no way of making sense of them. The system architecture itself makes coherent communication nearly impossible.
  • The assumption problem. States that have been running a system for years tend to back off communication efforts, especially when budgets are tight. In our work, we see this pattern regularly: A state invests in communication during the first year or two of a new system, then scales it back as the system becomes routine. But staff turns over, new families enroll, and whatever understanding existed erodes. No one thinks about assessment as much as assessment professionals do.
  • The one-pager problem. When confusion mounts, the instinct is to create a clarifying document. This can be thought of as a rule of thirds: Some people read and understand, some read and leave even more confused, and plenty never open it. One-pagers pile up, either doing nothing or muddying the waters.

The bottom line: If your assessment system requires a decoder ring to understand it, no amount of communication will fix it. You cannot message your way out of a design problem.

Force 3: Flexibility as a Double-Edged Sword

Flexibility often provides the legal “permission” to add without subtracting.

The current federal landscape gives states a lot of flexibility, and as a result, many states are pursuing different approaches to assessment. A growing number are exploring through-year assessment models. Many have adopted college entrance exams as their primary high school assessment and are now exploring other approaches for high school testing requirements. Others are experimenting with state-approved vendor lists to provide districts with options for interim and screening tools.

This experimentation is worth watching. But flexibility without coordination has a predictable downside: fragmentation. And that fragmentation shows up in several ways.

The comparability problem. When neighboring districts in the same state use different interim systems, different screeners, and different benchmarks, educators and parents lose the ability to compare results across schools. A “proficient” score in one district may mean something entirely different in the next. State leaders who want to monitor equity or allocate resources based on need find themselves working with data that cannot be meaningfully aggregated.

The problem is not only comparisons across districts. It is also comparisons across measures that were never designed to align. A student who does not reach proficiency on their summative assessment may have been flagged as proficient on a screening tool. There is nothing wrong with either measure. They just were not designed to work together. You can have vinegar and baking soda in your kitchen. They each have a purpose. But if you mix them, it will be a mess.

The trust problem. When districts show better results on their locally administered interims than on the state summative, the disconnect creates tension. Parents wonder which number to believe. Teachers feel caught between two signals. Rather than asking why the measures disagree, because they are built on different scales, measure different constructs, or were administered under different conditions, the instinct is to add another assessment to reconcile the gap.

Where the Forces Converge

These three forces do not operate in isolation. The cycle of adding tools creates redundancy. The communication trap creates confusion. And flexibility without support creates fragmentation.

Running beneath all three is a problem that cuts across the entire system: lack of assessment literacy—not just for teachers, but for administrators, parents, and policymakers. People may not understand the purpose each assessment serves, whether it is appropriately aligned with its intended use, or how to act on the data it produces. Without that understanding, more tools, more flexibility, and more communication all produce the same result: more confusion.

Consider what this looks like on the ground. A 3rd grade teacher in a midsized district might administer a universal screener in the fall, a benchmark in the winter, and the state summative in the spring. Each uses a different scale. Each defines “proficiency” differently. The screener flags a student as on track. The benchmark flags the same student as at risk. The summative results arrive over the summer. The teacher was never trained on how these measures relate to each other, or whether they should. She is left to reconcile three signals on her own, and by the time the summative data arrive, her students have moved on to 4th grade. The system produced data at every turn. It did not produce understanding.

We cannot bridge these gaps with a single professional development day. We need a system in which teachers and policymakers actually know how to turn a test score into an instructional or policy move. Until then, more options will continue to mean less clarity.

What Progress Looks Like

Progress here is not about reaching an ideal end state. It is about taking the next credible step. These steps follow directly from the forces described above.

Reflect before you add. The assessment inventory is a starting point, not the destination. For every assessment in the system, ask two questions:

  • Who uses the data?
  • For what decision?

If you cannot answer both clearly, that assessment is a candidate for removal.

One credible step is to adopt a sunsetting protocol, where every assessment carries a literal expiration date. For example, every 3 years, can we show that the data are actually being used to help students? If we cannot point to specific decisions the measure has informed, the fence comes down by default.

Stop communicating after the fact. When selecting or building an assessment, require a communication plan as part of the design, not a one-pager explaining the scores after results come back. Instead, require a plan, developed alongside the assessment and preferably alongside parents, teachers, and students, that specifies what each audience member should understand from each measure and how that understanding will be built. Consider treating each year as a fresh rollout: Your audience turns over even when your system does not. And reach people through multiple channels: short videos, road shows, teacher ambassadors.

If the system architecture makes clear communication impossible, the architecture needs to change, because no amount of outreach will compensate for a system that produces contradictory signals.

Name the trade-offs before someone else does. Every design choice involves a trade-off. More local flexibility means less comparability. Richer measures mean longer tests. Faster turnaround may mean noisier scoring. States that name these trade-offs openly, before critics or the press discover them, build more trust than ones that pretend they can have everything. This also means being honest about what “balance” actually requires. We often treat the familiar balanced assessment pyramid as an end in itself, as if having formative, interim, and summative layers automatically creates a functional system. The goal is not to fill out a diagram; it is to design a system where every measure supports a shared, coherent purpose.

This post draws on WestEd’s work with state assessment systems, including discussions at the State Strategy Lab on Assessment and Accountability. For more on coherent and effective assessment systems, visit csaa.wested.org.

References

Arnold, J., & Webb, J. (2024). Key elements of a coherent and equitable local assessment system. Center for Standards, Assessment, & Accountability, WestEd.

Marion, S. F., Pellegrino, J. W., & Berman, A. I. (Eds.). (2024). Reimagining balanced assessment systems. National Academy of Education.

More Related to This Post