School systems regularly use student assessments for accountability purposes. But, as highlighted by our conceptual model, different configurations of assessment usage generate performance-conducive incentives of different strengths for different stakeholders in different school environments. We build a dataset of over 2 million students in 59 countries observed over 6 waves in the international PISA student achievement test 2000-2015. Our empirical model exploits the country panel dimension to investigate reforms in assessment systems over time, where identification comes from taking out country and year fixed effects along with a rich set of student, school, and country measures. We find that the expansion of standardized external comparisons, both school-based and student-based, is associated with improvements in student achievement. The effect of school-based comparison is stronger in countries with initially low performance. Similarly, standardized monitoring without external comparison has a positive effect in initially poorly performing countries. By contrast, the introduction of solely internal testing and internal teacher monitoring including inspectorates does not affect student achievement. Our findings point out the pitfalls of overly broad generalizations from specific country testing systems.
We gratefully acknowledge comments from participants at the Spring Meeting of Young Economists, the BGPE Research Workshop, and the center seminar of the ifo Center for the Economics of Education. This work was supported by the Smith Richardson Foundation. The views expressed herein are those of the authors and do not necessarily reflect the views of the National Bureau of Economic Research.