In the past, teacher evaluations at many schools were cursory exercises, consisting of little more than simple checklists that did not reflect the complexity of teachers’ instruction. And, just like the children of A Prairie Home Companion’s fictional Lake Wobegon, all teacher performance was generally viewed as above average — or at least satisfactory.

Then, beginning around 2009, new federal policies called for more rigorous approaches to teacher accountability and evaluation that integrated multiple measures of teacher effectiveness. As states and districts began overhauling their teacher evaluation models accordingly, they grappled with critical questions: How do you accurately assess teacher performance? Are student test scores a valid way to measure the impact of a teacher’s instruction? How reliable are classroom observations in distinguishing between effective and ineffective teachers?

“States were hungry for research and guidance on the new teacher performance measures they were developing or adopting,” says Reino Makkonen, a senior policy associate at the Regional Educational Laboratory (REL) West at WestEd. To help states address key issues as they were laying the groundwork for their new teacher evaluation systems, REL West conducted a series of studies examining various performance measures. As the work progressed, staff collaborated with education officials in Arizona, Nevada, and Utah to help them use the research findings to inform and refine their teacher evaluation systems.

“No measure of teacher performance is perfect. All contain some error,” says Makkonen, who leads the REL West team dedicated to educator effectiveness issues. “But when administrators and teachers sit down to review and discuss the different types of data together, they can often reach a good understanding of where the teacher is and what some appropriate next steps for improvement might be.”

Measuring teacher effectiveness

To help administrators consider the various benefits and challenges of different measures of teacher performance, REL West staff synthesized findings from both their own research and literature from the field to develop a two-page logic model and an animated video, both titled Making Meaningful Use of Teacher Effectiveness Data. Below are a few highlights.

Observation-based measures. Although time- and labor-intensive, classroom observation is a direct, credible form of assessment. For example, the observer can note the structure and pacing of the lesson and the kinds of discussion techniques the teacher uses to engage students. This measure is more reliable with multiple observations and multiple observers, especially since principals’ scoring has tended to indicate little variation among teachers.

Measures of teacher contribution to student learning. Using standardized test scores to measure teachers’ influence on student learning does not require extra work from principals or teachers. However, those scores do not reliably reflect the reasons for student test results — for instance, teacher instruction, peer influence, school factors, or other sources in children’s school, home, and community life. As a result, they have proven to be of limited value for assessing teacher performance.

End-of-year scores from student learning objectives (SLOs) — which are set by teachers and their principal to measure classroom-specific student achievement growth — were found to differentiate between high- and low-performing teachers in a REL West study. However, SLOs are not standardized or comparable across contexts.

Student perceptions of teacher effectiveness (surveys). Students have daily contact with teachers, and students’ ratings of teachers have been shown to be consistent from year to year and across different classrooms of students. Surveys may also play a helpful role given a rising interest in social and emotional indicators of accountability, says Makkonen: “Teachers are trying to create engaging, supportive environments in the classroom, so we should consider trusting students to provide useful feedback.” At the same time, since students are not trained to assess curriculum, classroom management, or content knowledge, their observations about these potential influences may have limited value.

Building an infrastructure for data review and feedback

While REL West initially concentrated on helping states and districts better understand the various measures of teacher performance in order to develop and refine their evaluation systems, the landscape has shifted, says Makkonen. Now that many of these systems are up and running and schools have begun collecting multiple types of teacher performance data, the overriding question has become: What exactly should administrators do with all these data? Accordingly, REL West’s recent work has moved toward a more explicit focus on the practical uses of teacher evaluation data at the district and school-site levels.

“Are we building a data museum?” Makkonen remembers an overwhelmed principal asking in relation to the multiple streams of teacher evaluation data that his school was collecting. The principal worried that huge swaths of data would end up sitting unused in various spreadsheets and databases, gathering virtual dust. To get a better understanding of how districts are tackling practical and logistical issues — like the principal’s concern about data accessibility — REL West studied five districts in Arizona to examine how they organize and use their teacher evaluation data. One of the main takeaways from the Arizona study, notes Makkonen, is that schools and districts are wise to first focus on building an infrastructure for data review and feedback. “To be useful to educators and administrators, the right data must be available at the right time and in the right format.”

For example, some districts have built data dashboards, which essentially are teacher “report cards” that display results from classroom observations, student assessments, and any surveys that students filled out. Such dashboards organize the different types of data and prevent them from being scattered across different databases or delivered in hard-to-interpret formats.

“Administrators seem to find such dashboards useful as organizers of disparate streams of data,” says Makkonen. “Having the information all in one place can also facilitate rich feedback conversations between coaches, principals, and teachers.”

Learning to use data to improve teacher practice

Another key finding from the Arizona study: Evaluation data influence the professional development opportunities subsequently offered to teachers. Officials from all five districts in the study reported that they used their standards-based instructional frameworks and observation rubrics to identify teachers’ strengths and weaknesses across multiple domains, and plan professional learning accordingly. The study also found that classroom observation data were seen as more useful than student test scores for informing professional development decisions because results from multiple observations are collected and accessible throughout the school year, while statewide student test scores are often not released until the summer.

“It’s not about trying to get a perfect measurement in order to rank teachers. It’s about improving the workforce.”

Using teacher performance data for targeted decision-making — such as what type of professional development to offer teachers — has been the focus of a series of workshops conducted in 2016 by REL West, in collaboration with the Center on Great Teachers and Leaders. The workshops have been specifically designed for principals, says Makkonen, because principals are increasingly asked to play a key role in supporting teachers. [Access a free video-based version of the workshop, with related materials.]

“There are more demands on school administrators now,” says Makkonen. “It’s no longer enough to just manage the logistics of the site. More and more, administrators are being positioned as instructional leaders or coaches. Even if they are not experts in pedagogy or specific academic content areas, they can create the conditions for teachers to receive constructive, data-informed feedback about their instruction.”

In addition to helping principals learn to use teacher performance data positively to facilitate these kinds of professional conversations, the workshops give principals hands-on experience in cataloging and analyzing the data they have available. Participants learn to use these data to make a variety of decisions, including assigning teachers to appropriate grades and classes and identifying potential teacher leaders and mentors.

Making meaningful use of teacher effectiveness data is more important than ever, says Makkonen, because many regions are facing teacher shortages — which means that districts’ and schools’ focus is shifting from using evaluation measures primarily for teacher accountability toward using the measures to inform decisions that will help support, retain, and improve current teachers. To achieve these goals, says Makkonen, administrators need to ask, “Are we using the information we’ve gathered to inform conversations and actions that create a more supportive environment for teachers, so they don’t feel lost and frustrated in their work?”

Ultimately, says Makkonen, teacher evaluation is a process of continuous improvement. “It’s not about trying to get a perfect measurement in order to rank teachers. It’s about improving the workforce.”

The Institute of Education Sciences at the U.S. Department of Education has funded all of REL West’s research. REL West also collaborates closely with the West Comprehensive Center, which has helped disseminate findings from the research.