FacebookBlueskyLinkedInShare
Sarah Quesen

Sarah Quesen

Director, Assessment Research and Development

Overview

Sarah Quesen is an expert in statistics and psychometrics with a keen interest in emerging technologies. As Director of Assessment Research and Development at WestEd, she leads the agency’s assessment team. This portfolio brings together assessment design, content development, psychometric research, scoring and reporting, and assessment literacy. She partners with states, districts, and organizations to build coherent assessment systems that support meaningful instructional and policy decisions, including accountability. 

Quesen’s work centers on designing assessments with a clear purpose. She focuses on matching assessment type to intended use; ensuring tight alignment across standards, items, and reporting; and strengthening the link between design decisions and classroom practice. Her research emphasizes gathering and evaluating validity evidence to support how assessment scores are used. She also has experience in evaluating AI scoring, with a particular focus on comparability between human and machine ratings across student groups. 

Current projects include a framework for evaluating automated scoring; studies of GPT-generated assessment content; machine-learning methods for examining score comparability across tools; and multistage validation studies in early literacy and numeracy that support screening, progress monitoring, and measurement-informed intervention in the early grades. 

Quesen has more than 25 years of experience as an educator and continues to serve on the teaching faculty in the Department of Statistics at the University of Pittsburgh. Prior to joining WestEd, she was a senior research scientist at Pearson, where she served as lead psychometrician on complex, large-scale assessment contracts. Her extensive experience in both academic and operational settings drives her commitment to disrupting outdated assessment practices. 

Education

  • PhD in research methodology, University of Pittsburgh 
  • MPH in public health, West Virginia University School of Medicine
  • BS in culture and communication, New York University

Select Publications

Brunetti, M., Langi, M., & Quesen, S. (2025). Are we on the same page? A discussion on the use and misuse of early literacy assessments. https://doi.org/10.31219/osf.io/ze3qj_v2 

LeBeau, B., Clarke, M., Rodriguez, B., & Quesen, S. (2025). Fine-tuning models in a secure AI environment: Technical implementation (Spotlight Brief). Data Integration Support Center, WestEd. 

Quesen, S., & LeBeau, B. (2025, April). Pairwise no more: Rethinking bias detection methods for complex intersectional data [Paper presentation]. National Council on Measurement in Education Annual Meeting, Denver, CO, United States. 

White, L., Nesbitt, J., Roeters-Solano, H., Quesen, S., Lottridge, S., & Lochbaum, K. (2025). Culturally responsive and sustaining approaches to scoring. In C. Evans & C. Taylor (Eds.), Culturally responsive assessment in classrooms and large-scale contexts: Theory, research, and practice. NCME. 

Murphy, D., Quesen, S., Brunetti, M., & Love, Q. (2024). Expected classification accuracy for categorical growth models. Educational Measurement: Issues and Practice, 43(2), 64–73. 

Quesen, S., Armstrong, P., & Timberlake, A. (2024, June 24—26). Leveraging generative AI in developing assessments that reflect all learners in your state [Paper presentation]. National Conference on Student Assessment, Seattle, WA, United States. 

Quesen, S., & Lane, S. (2019). Differential item functioning for accommodated students with disabilities: Effect of differences in proficiency distributions.Applied Measurement in Education, 32(4), 337–349. 

Lochbaum, K., Quesen, S., Workman, T., Zurkowski, J., & Hauger, J. (2018). Faster and better: The continuous flow approach to automated scoring [Paper presentation]. National Conference on Student Assessment, San Diego, CA, United States. 

Steedle, J., Quesen, S., & Boyd, A. (2017) Longitudinal study of external validity of the PARCC performance levels: Phase I report. Pearson. 

Steedle, J., Quesen, S., & Boyd, A. (2017) Longitudinal study of external validity of the PARCC performance levels: Phase I report. Pearson.

More Related to This Featured Expert