Revising State Assessments: Balancing Efficiency and Quality

August 11, 2025

By Sarah Quesen and Andy Latham

State education agencies (SEAs) are under pressure to shorten assessments. Lawmakers want to cut costs of assessments, and teachers and families are pushing back on how much time testing takes. Although calls for more efficient assessments have been around for years, the momentum behind them has grown.

In July, the U.S. Department of Education released guidance allowing states to request waivers from certain Elementary and Secondary Education Act (ESEA) requirements, including several new waiver options beyond the Innovative Assessment Demonstration Authority (IADA) program that has been in place for the past several years. States can now propose ways to reduce testing time while still producing usable data.

SEAs want to modernize assessments. The challenge is how to do this more efficiently. The opportunity for waiving some ESSA requirements raises important questions about fairness, long-term trend data, and the ways assessment results are used in policymaking.

Waiver Strategies for Reducing State Testing Time

A 2025 brief, Accountability and Assessment: Six Targeted Federal Waiver Ideas for Advancing Student-Centered Learning, outlines several ways states could rethink assessment and accountability. While some ideas focus on incorporating local or school-level indicators into accountability, others deal directly with assessment design. Two of the six stand out for states aiming to reduce testing time:

focus assessments on a smaller set of priority standards
use matrix sampling, a method in which students take different portions of the test

Both approaches can reduce testing time. Both come with trade-offs that need to be addressed up front.

Prioritizing State Standards to Reduce Test Length

This approach calls for states to identify a subset of standards as priorities for assessment. Rather than testing every standard, the test would focus on those considered most essential based on input from educators and other local voices. This could significantly reduce test length and potentially make room for more complex or authentic items that are hard to fit into traditional blueprints.

But narrowing the blueprint carries risks:

Instruction may narrow to what’s tested.
Trends over time may be harder to interpret if the priority set shifts.
Some learning gaps may go unnoticed if certain content is never assessed.

There’s also the issue of what’s implied. State standards are typically adopted as a full framework. Testing only a subset may suggest that the rest aren’t important, even if that’s not the intent.

If states move in this direction, the process for selecting priority standards should be research informed, teacher informed, transparent, and well justified. There should also be safeguards in place to monitor whether this shift limits content coverage or affects particular student groups disproportionately.

Sampling to Reduce Student Testing Time

For states that want to preserve full content coverage while reducing testing time, another path is matrix sampling. It is used by the National Assessment of Educational Progress (NAEP), which provides national-level data on student achievement. In a matrix-sampled design, each student takes only a portion of the test. Results are aggregated across students to produce school- or district-level scores or, in NAEP’s case, national-level reporting. The content is still fully covered at the system level, but no single student takes the entire test.

This design can reduce testing time while preserving trend data. It works well for system monitoring, but it’s not suitable for all use cases:

Individual scores aren’t reliable or comparable when students only see part of the test; the data are intended to be aggregated.
Like NAEP, scores can’t support student-level decisions or directly inform classroom instruction.
Students, teachers, and caregivers would not receive individual reports, which is a significant change for the field.

For states that still need student-level results, matrix sampling is not viable on its own. However, it can be combined with other interim and classroom-based tools designed for individual-level feedback to reduce end-of-year testing while maintaining consistent, sound statewide measurement.

However, both approaches raise the same underlying challenge: how to design changes without undermining fairness or long-term data. That’s where simulation research can play an important role.

Using Simulation to Support Smarter Assessment Design

Designing shorter assessments isn’t just a policy decision. It requires careful modeling to understand how changes will affect score quality, fairness, and interpretation. Before moving forward, states need to understand how scores will be used, what claims they intend to make about student knowledge, and what stakes those claims carry. Leaders must ask the following questions:

Will shortened forms still provide reliable results for different student groups?
How much reliability is lost by shortening a form? What happens to classification accuracy?
What are the considerations for validity if reducing to a priority set of standards or matrix sampling?
How will trend lines be maintained over time?

Simulation and blueprint modeling tools can help answer many of these questions. These methods are not new and have long been used in large-scale test design. What has changed is the complexity of the design choices and the need to evaluate trade-offs before implementation. Newer techniques, including machine learning, can expand this work by modeling how different blueprints affect score precision for student groups, simulating how sampling would perform across a state, or identifying item combinations that maintain coverage within a shorter form.

Using Models to Answer Assessment Redesign Questions
Example: States can use existing test data and machine learning to identify potential risks introduced by reduced forms

Train and validate on full-form test data
- Use complete assessments with known classifications or scores
- Include features such as grade, prior scores, and demographics
- Ensure the model learns to predict classifications from full responses
Simulate reduced-form performance
- Select items to mimic a shorter test
- Predict classifications using the reduced form
- Compare accuracy between reduced and full forms
Analyze misclassification patterns
- Model misclassification risk with regression
- Identify student groups most likely to be affected

These tools allow redesign decisions to be grounded in evidence. For example, a simulation can estimate how a 20 percent reduction in test length might affect reporting accuracy for English Learners or students with disabilities before any changes are made. This analysis can also flag which student groups are most likely to be misclassified under a reduced design.

Test design changes affect more than testing time. They shape what gets taught, what gets reported, and who gets seen in the data. During the pandemic, many states offered shortened or simplified test forms. Afterward, there was a reluctance to return to full-length tests. Reversing course on a redesign has both political and logistical considerations. Doing exploratory research up front can help ensure that changes are both defensible and sustainable.

WestEd’s Role in Supporting State Assessment Redesign

WestEd’s assessment team works with SEAs to design and evaluate assessment systems that balance efficiency with quality and a careful eye on fairness. We help states

simulate the impact of reduced forms on measurement quality,
design and validate matrix-sampling plans,
audit test blueprints for fairness and student representation, and
prepare evidence for peer review or waiver submissions.

Assessment systems are changing. Some states are taking the lead, while others are waiting to see what works. In either case, shorter assessments are likely to come. With the right design, states can streamline testing while preserving meaningful results across student groups.

Learn more about our assessment work and partner with us.

Sarah Quesen is an expert in statistics and psychometrics with a keen interest in emerging technologies. As Director of Assessment Research and Innovation (ARI), she leverages her understanding of assessment systems to lead rigorous, transformative research and provide evidence-based technical assistance to states, districts, and commercial organizations.

Andy Latham is the Vice President of Science and Assessment at WestEd. He is particularly interested in refining current testing models to collect valid, actionable information from students as efficiently and cost-effectively as possible.

Revising State Assessments: Balancing Efficiency and Quality

Waiver Strategies for Reducing State Testing Time

Prioritizing State Standards to Reduce Test Length

Sampling to Reduce Student Testing Time

Using Simulation to Support Smarter Assessment Design

WestEd’s Role in Supporting State Assessment Redesign

More Related to This Post

An Evaluation of the Air Tutors Program: Findings From an Arnold Ventures Grant

Early Literacy Screening Assessment Benchmarks

Bilingual Beginnings: Planning Kindergarten Entry Assessments for Spanish–English Speakers

Components of a Coherent and Effective Assessment System

Comparing Early Literacy Screening Assessment Benchmarks in Massachusetts

Revising State Assessments: Balancing Efficiency and Quality

Waiver Strategies for Reducing State Testing Time

Prioritizing State Standards to Reduce Test Length

Sampling to Reduce Student Testing Time

Using Simulation to Support Smarter Assessment Design

WestEd’s Role in Supporting State Assessment Redesign

More Related to This Post

Insights & Impact

Let Students Struggle: Preserving Student Agency in the AI Era

Critical Insights for Improving Assessment Practices in K–12 Education

Four Key Assessment and Accountability Trends Education Leaders Should Monitor

Reading Risk Benchmarks and What They Mean for Assessing Student Progress

Comparing Early Literacy Assessments: What Really Matters

Designing and Integrating Interactive Science Simulations for Large-Scale Assessment

Resources

An Evaluation of the Air Tutors Program: Findings From an Arnold Ventures Grant

Early Literacy Screening Assessment Benchmarks

Bilingual Beginnings: Planning Kindergarten Entry Assessments for Spanish–English Speakers

Components of a Coherent and Effective Assessment System

Comparing Early Literacy Screening Assessment Benchmarks in Massachusetts