By Marianne Perie, Director of Assessment Research and Innovation at WestEd. Perie is responsible for shaping the agency’s psychometric and assessment research. She provides deep measurement expertise that draws on her more than two decades of experience working to improve educational equity through high-quality research.
Many people think a standardized assessment means a long, multiple-choice test that takes time out of instruction and punishes teachers. In my mind, standardized assessment means we want to know how students are doing from an educational standpoint on a metric that allows us to compare their knowledge to other students in their grade, school, district, state, or country.
In this post, I review the recent history of school assessment and accountability and provide seven recommendations for consideration in the reauthorization of the Elementary and Secondary Education Act (ESEA).
A Brief Overview of K–12 Testing
Before 2001, no one knew what a psychometrician was or that test design was an actual profession. After 2001, when people asked me what I did, I could say, “I help states develop tests for No Child Left Behind (NCLB),” and people knew what I was talking about. Then came the Common Core State Standards, followed closely by Race to the Top and teacher evaluation based on tests.
The Every Student Succeeds Act (ESSA), passed in 2015, removed many of the punitive consequences of NCLB related to assessment and accountability. Specifically, ESSA provides a greater emphasis on flexibility in decision-making and funding options related to assessments, comprehensive accountability systems, and mechanisms for identifying and supporting schools in need of improvement. In addition to state assessments, schools could use other indicators to define success, such as student volunteerism or school climate and safety. Unfortunately, the damage had been done. In linking test scores to teacher evaluation under the NCLB waivers, the trust eroded between the assessment community and teachers.
An additional promise of ESSA was the inclusion of the Innovative Assessment Demonstration Authority (IADA). This piece indicated that up to seven states would be awarded flexibility to try out a new, innovative approach to assessing students. The intent was promising, as it gave states a chance to try out new innovations, such as performance tasks, student portfolios, and through-year assessments. Unfortunately, the restrictions that still existed under ESSA hindered states’ abilities to push innovation very far. Pilot districts were required to take both the innovative and traditional assessments unless states could prove the two were comparable. However, to be comparable, the innovation had to be restricted. In the end, only five states applied for and were awarded the flexibility; two states later dropped out.
Despite the improvements made at the federal level in recent years to improve statewide standardized testing, many still question the value of yearly standardized testing in schools. Standardized testing and participation requirements did make a difference for students at risk as schools were no longer able to ask them to stay home on testing day. However, much work remains to be done to improve equity within and across our schools. Students who are Black, Indigenous, and Hispanic graduate high school at lower rates than their White and Asian peers and require remedial coursework in college more often.” What is more, the costs and time associated with assessments, delayed results, and failure of tests to improve students’ academic results leave many to wonder if they are worth the effort at best and, at worst, if they harm students and punish teachers and schools.
So, where do we go from here? First, we need to narrow the focus of the assessments back to their original intent: to measure school and district success at teaching students to the level of proficiency. The length of tests is directly related to the number of purposes they try to fulfill. The purpose of school accountability assessments became quite convoluted during the NCLB years when the uses kept adding up: (1) Divide kids into three levels—below, at, and above proficiency; (2) measure student growth over the years; (3) attribute scores to teachers and use them to evaluate their effectiveness; and (4) provide diagnostic feedback to teachers and parents. I would argue that only the first two uses are worth continuing. Many states have dropped the teacher evaluation component and those who have not do not need lengthy tests to add a data point to their system. Other tests (namely, diagnostic interims) do a much better job of providing finer-grained feedback to teachers and parents.
Considerations for Education Leaders and Policymakers
Education leaders and policymakers might consider the following policies to allow for more innovation and flexibility within state summative assessments.
Examine the impact of previous waiver efforts. At the time of this writing, Congress is considering a waiver process for ESSA, much as was done during the NCLB years while waiting for the right time to reauthorize ESEA. Before doing so, we should analyze the impact of the NCLB waivers. What was accomplished? How did it impact schools and states and, most importantly, student achievement? Policymakers should understand that once waiver applications are released, state assessment directors will be required to focus their attention on those applications for months at a time. We need to ensure their time is spent requesting flexibility that improves schools.
Allow for flexibility at the high school level. Test-based accountability has never worked well in high school with only a single grade assessed. First, schools are only required to test in one grade of high school, making growth in high school impossible to measure. Second, mathematics is not a subject that fits neatly into grades 9–12. Many districts separate math into specific courses, such as Algebra I, geometry, and Algebra II, and allow the students to take them when ready. This practice results in no course that all students take in one grade and requires states to bank test scores. Other districts teach mathematics in a more integrated approach, which does allow for easier grade-level testing but is a nightmare for states that have districts adopting each model. Instead, we could examine the activities that lead to postsecondary success, such as advanced coursework, dual enrollment, apprenticeships, and badging, and restructure high school accountability around those measures.
Sample the content by students. If the purpose of accountability tests is limited to assessing student proficiency and growth over time, we can drastically shorten the test. Each student would only receive a score and achievement level, but we would be able to make the same evaluation of schools and districts. Currently, ESSA requires that every student be assessed on the full depth and breadth of the standards. The purpose of that requirement is to ensure every student is taught the full depth and breadth of the standards; however, we do not need to test them all to incentivize that behavior. We can test the depth and breadth of the standards across the school but only require each student to be tested on some of them. Using a random sample design, neither the student nor the teacher will know which student will be assessed on which standards, requiring all students to be taught all standards. By not requiring students to be tested on the full depth and breadth of all standards and by not reporting subscores, we can get reliable scores in half the time we take to assess students today.
Simplify parent reports. If the tests are shortened, we can produce straightforward reports that tell a parent how a student scored along with their performance level. Strong performance level descriptions (PLDs) will provide criterion information about the student’s general knowledge and skills. Then, provide normative data to compare the student’s score to the other students in their grade level at their school, district, and state. We could, and must, continue to disaggregate school, district, and state data by the various student groups. We need to continue to monitor the progress of our students with disabilities, English learners, and our traditionally marginalized students and use that data to continue to pursue effective methods of teaching and supporting those students.
Redesign the peer review process. Far too much of state assessment directors’ time is spent submitting evidence regarding their tests, only to wait months—and sometimes over a year—for feedback. Peer review has become a lockstep checklist of rigorous requirements that do not necessarily result in better tests.
Improve feedback to educators. Specifically, move the emphasis from the now-shortened summative test into through-year diagnostic testing to give teachers better feedback throughout the year. However, we need to research what makes a diagnostic test effective and leads to improved teaching and learning. Districts spend millions of dollars on the existing products, but we have little evidence that they increase teachers’ understanding of student understanding or help improve student learning.
Invest in developing a process to ensure tests are reflective of authentic, lived experiences. Give kids context they can relate to and find interesting. Dry, unfamiliar contexts limit students’ ability to engage. There is evidence that students perform better when they are engaged, so let’s set our test questions in contexts they find engaging and relevant while remaining true to the standards students are learning (Scheidler, 2012). Let’s further research into using technology and gaming software that already shows promises in “stealth” assessment.
Focus on ensuring literacy and numeracy skills are sufficient before students get to 3rd grade. Longitudinal analyses have shown that students who are well below proficiency in 3rd grade often never recover (Hernandez, 2012). We do not need federally mandated testing in early grades, but proper incentives could encourage states to work with their districts and schools to adopt strong measures and provide interventions earlier.
Teachers and parents need data but are weary of current assessment and accountability systems. We need to fix those systems to continue our focus on equitable access and outcomes while embracing innovation and improvement.
The Anne Casey Foundation. (2012). Double Jeopardy: How third grade reading skills and poverty influence high school graduation. Baltimore, MD: Hernandez, D. J. Retrieved from https://www.aecf.org/resources/double-jeopardy
Scheidler, M. J. (2012). The relationship between student engagement and standardized test scores of middle school students: Does student engagement increase academic achievement? Retrieved from the University of Minnesota Digital Conservancy. https://hdl.handle.net/11299/143657